Possible Future Extensions

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible Future Extensions

Andre Wiethoff
Hello everybody,

I am Andre Wiethoff, the person behind Exact Audio Copy and Easy Audio
Copy. My interest is mostly in audio and nowadays I also do quite some
research on personalized music recommender systems.

I came to the conclusion that only collaborative filtering will help
most for producing good recommendations (for the time being). Even
though AcousticBrainz produces impressive results, it is not yet up to
be used exclusively for choosing a good playlist for the user. In my
opinion, perhaps we will have in 10 years the analysis algorithms that
are capable of producing results that are exact enough for productive use.

It would create great value for the user if an application would be able
to create a playlist for the given user depending on his preferences.
Not only for streaming, but also using his local music collection. For
now, the necessary data are all closed source, available only to the
owning companies (and sometimes available as an API with big
restrictions what to do with the data - some algorithms couldn't even be
run if the database is not available completely, e.g. doing
collaborative filtering with self-created algorithm is not possible with
any of the APIs (as far as I know)). Users are submitting their
information for free and the companies lock them away - there should be
an open source counterpart to stop this closed information handling.
(And of course such a database would need some lobbying work for the
larger, commercial player companies like Sonos in order to add
submission to their players).

As a possible future extension it would be great if such information
(e.g. scrobbling and, even more important, personal ratings of songs for
each user) could also be stored in MusicBrainz. For that, a unique token
per user need to be created (more or less automatically, should be easy
for the user to create within any end-user application - that should be
unique for all applications that the user uses). Creating such a token
would be quite difficult when trying to do it the easy way (as when such
a token already exists with one application, the other application need
to retrieve that token without creating one first - this will help
moving the personal information to a new player). And it should be easy
enough that commercial player software/hardware would also be willing to
implement it. Perhaps the best idea would be still to use Username/Password?
Further, AcoustIDs should be used to unambiguously identify a specific
song played (other metadata could verify the correctness of the
assignment). Last.fm does scrobbling by metadata only (as far as I
know), which might cause conflicts by different versions of a song (e.g.
radio edit, live, etc.).
Finally, the ratings should reflect various liking levels (I propose -2,
-1, 0, 1, 2 - perhaps best displayed as thumbs up/medium up, etc.), as
the more songs the user is able to rate, the better will be the results
of the recommendation engine. If just offering thumbs up/down, people
will not rate songs that are quite ok (and not rating a song should
provide no (implicit) rating at all, as the user could e.g. let the
songs play unattended).

Of course at the beginning there will be no software which fills the
database with information, but I think there need to be a framework
first before some audio player would integrate the API to provide
information about the songs a user plays (and hopefully likes or dislikes).

Only after the database has been populated quite a bit, the player
applications will receive something back for the submission of
information. I would propose that also a baseline recommendation engine
should be implemented in MusicBrainz, which can be called via API.
Researchers or developers who want to create a better recommendation
engines would be free to work on the full database and implement their
own ideas. Of course creating a baseline recommendation engine is still
a huge project, there are some shortcuts which would help creating
playlists somwhat more easy...

What are your thoughts on this?

Thanks for your time!

Best regards, Andre Wiethoff

PS: I proposed to add AcoustID submission into EAC (submitting the
fingerprints with each extracted CD together with the DiscID) already
quite some time ago, directly to one of the developers (don't know
anymore to whom), but didn't receive any reply. If something like this
is interesting, please contact me (privately?).



_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

mayhem
Administrator

> On May 4, 2015, at 12:44, Andre Wiethoff <[hidden email]> wrote:
>
> Hello everybody,
>
> I am Andre Wiethoff, the person behind Exact Audio Copy and Easy Audio
> Copy. My interest is mostly in audio and nowadays I also do quite some
> research on personalized music recommender systems.

Hi!

We're rather quite busy with upcoming releases, so its going to take a bit to respond.

I hope to respond sometime next week.

Sorry!

--

--ruaok         Excel is not a database!

Robert Kaye     --     [hidden email]     --    http://musicbrainz.org


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Daniel Sobey
In reply to this post by Andre Wiethoff
Hello Andre,

This is something that I would like to do eventually but I have not gotten around to starting.

Musicbrainz has user collections, this feature allows people to add releases to a public or private list.
For the moment this only works for Releases (ie albums) and events but I believe as part of the shema change on scheduled for the 18th of May this will be changed to allow for more types of data to be stored.

Musicbrainz also has a rating system that allows people to submit a number between 1 and 5 for most entries but there is not a lot of people that use this

One thing missing is collections would not be able to store play counts so you would need to store that history somewhere else.

We do not necessarily need to write our own scrobbeling api, it might be useful to use last.fm or libre.fm
Last.fm has api's so it should be possible to pull or push information out of there easily.
libre.fm is open source but it has mostly been dormant for the last few years.

The way that I am thinking of implementing this sort of system is to create a website where a user can login to musicbrainz, last.fm and libre.fm and the website will be responsible for sitting in the middle and trying to best match scrobbles on last.fm with recordings on musicbrainz.
Once we have the data matched to musicbrainz the task of the recommendation engine should be a little easier.
It should also allow a little more portability and allow someone else to build competing recommendation engine but still use my tool as a bridge between these services.
One thing that I want as feature 0 is the ability to export thair data and import in thair own instance so if and when my service shuts down everyone can run the software on thair own hardware or go to someone else that hosts it.

Anyway happy to hear your thoughts.

Regards,

Daniel


On Mon, May 4, 2015 at 8:14 PM, Andre Wiethoff <[hidden email]> wrote:
Hello everybody,

I am Andre Wiethoff, the person behind Exact Audio Copy and Easy Audio
Copy. My interest is mostly in audio and nowadays I also do quite some
research on personalized music recommender systems.

I came to the conclusion that only collaborative filtering will help
most for producing good recommendations (for the time being). Even
though AcousticBrainz produces impressive results, it is not yet up to
be used exclusively for choosing a good playlist for the user. In my
opinion, perhaps we will have in 10 years the analysis algorithms that
are capable of producing results that are exact enough for productive use.

It would create great value for the user if an application would be able
to create a playlist for the given user depending on his preferences.
Not only for streaming, but also using his local music collection. For
now, the necessary data are all closed source, available only to the
owning companies (and sometimes available as an API with big
restrictions what to do with the data - some algorithms couldn't even be
run if the database is not available completely, e.g. doing
collaborative filtering with self-created algorithm is not possible with
any of the APIs (as far as I know)). Users are submitting their
information for free and the companies lock them away - there should be
an open source counterpart to stop this closed information handling.
(And of course such a database would need some lobbying work for the
larger, commercial player companies like Sonos in order to add
submission to their players).

As a possible future extension it would be great if such information
(e.g. scrobbling and, even more important, personal ratings of songs for
each user) could also be stored in MusicBrainz. For that, a unique token
per user need to be created (more or less automatically, should be easy
for the user to create within any end-user application - that should be
unique for all applications that the user uses). Creating such a token
would be quite difficult when trying to do it the easy way (as when such
a token already exists with one application, the other application need
to retrieve that token without creating one first - this will help
moving the personal information to a new player). And it should be easy
enough that commercial player software/hardware would also be willing to
implement it. Perhaps the best idea would be still to use Username/Password?
Further, AcoustIDs should be used to unambiguously identify a specific
song played (other metadata could verify the correctness of the
assignment). Last.fm does scrobbling by metadata only (as far as I
know), which might cause conflicts by different versions of a song (e.g.
radio edit, live, etc.).
Finally, the ratings should reflect various liking levels (I propose -2,
-1, 0, 1, 2 - perhaps best displayed as thumbs up/medium up, etc.), as
the more songs the user is able to rate, the better will be the results
of the recommendation engine. If just offering thumbs up/down, people
will not rate songs that are quite ok (and not rating a song should
provide no (implicit) rating at all, as the user could e.g. let the
songs play unattended).

Of course at the beginning there will be no software which fills the
database with information, but I think there need to be a framework
first before some audio player would integrate the API to provide
information about the songs a user plays (and hopefully likes or dislikes).

Only after the database has been populated quite a bit, the player
applications will receive something back for the submission of
information. I would propose that also a baseline recommendation engine
should be implemented in MusicBrainz, which can be called via API.
Researchers or developers who want to create a better recommendation
engines would be free to work on the full database and implement their
own ideas. Of course creating a baseline recommendation engine is still
a huge project, there are some shortcuts which would help creating
playlists somwhat more easy...

What are your thoughts on this?

Thanks for your time!

Best regards, Andre Wiethoff

PS: I proposed to add AcoustID submission into EAC (submitting the
fingerprints with each extracted CD together with the DiscID) already
quite some time ago, directly to one of the developers (don't know
anymore to whom), but didn't receive any reply. If something like this
is interesting, please contact me (privately?).



_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Andre Wiethoff
Hello Daniel,

thanks for your reply and your thoughts!
This is something that I would like to do eventually but I have not gotten around to starting.
Musicbrainz has user collections, this feature allows people to add releases to a public or private list.
For the moment this only works for Releases (ie albums) and events but I believe as part of the shema change on scheduled for the 18th of May this will be changed to allow for more types of data to be stored.
I haven't found any information/description about the user in the database scheme description on the homepage.
Of course it would be nice to extend the users that also (quite) anonymous users with a user token (and a password hash) could login and store their information with Musicbrainz.
Musicbrainz also has a rating system that allows people to submit a number between 1 and 5 for most entries but there is not a lot of people that use this
I also found no information about this on the webpage... I assume that this data is stored in the derived data database?
If so, we would need to think a about changing the scrobbling and rating to a different license, as I really would like to see that data free for any use (otherwise some uses are restricted for some players). Anyway, it would be a possibility to offer the baseline web service for recommending songs for free for non-commercial use and per license for commercial projects...
One thing missing is collections would not be able to store play counts so you would need to store that history somewhere else.
We do not necessarily need to write our own scrobbeling api, it might be useful to use last.fm or libre.fm
Last.fm has api's so it should be possible to pull or push information out of there easily.
libre.fm is open source but it has mostly been dormant for the last few years.
Last.fm has some restrictions. You are able to query the scrobbling history for a single user, but this would only help for exporting the data to a different database for a single user (which is not allowed). Also the usage of last.fm has a special terms of service ( http://www.lastfm.de/api/tos ), e.g. that they "the right to share in revenue generated from your use of Last.fm Data" or "You must not sub-license the Last.fm Data to others" or "For more information about how to apply to use the API and Last.fm Data for commercial purposes..." or "You are permitted to use the Last.fm Data solely for non-commercial purposes and for no other purpose and subject always to any limitations or conditions as advised to You by Last.fm at any time" or "You will not make more than 5 requests per originating IP address per second, averaged over a 5 minute period, without prior written consent". I think that these restrictions are enough for not being able to use the data in the last.fm database for a collaborate recommendation engine...
I didn't know of libre.fm, they have a large support community ("Libre.fm is supported by Bytemark, BigV, The Internet Archive and ISC"). The last commit was from may 2014, so there was no development for the last year at all. Further I think they have a kind of closed ecosystem for promoting "open" artists, which can be played directly from their webpage (sadly there was not much information about the service itself on their webpage). Additionally, merging their tags to Musicbrainz would be a bit "untidy" (I don't know a better matching word for that, sorry).
Finally, if one of the services closes (or just limit the access to the data), the users information are probably lost (or will be much harder to integrate into a new service) - I have better hopes and believes for the Musicbrainz service.
The way that I am thinking of implementing this sort of system is to create a website where a user can login to musicbrainz, last.fm and libre.fm and the website will be responsible for sitting in the middle and trying to best match scrobbles on last.fm with recordings on musicbrainz.
Once we have the data matched to musicbrainz the task of the recommendation engine should be a little easier.
It should also allow a little more portability and allow someone else to build competing recommendation engine but still use my tool as a bridge between these services.
I don't think that it would be too difficult to add the necessary features to the Musicbrainz database.

We would need a user table (which probably exists) with
UserID (an internal database ID)
LoginName (e.g. email address or any other kind of login name)
LoginNameHash (or alike for anonymous access of the user account)
LoginPasswordHash (the users password hash)
UserAllowsAnonymousAccess (provides a flag whether the user allows other applications/webpages to access his user statistics without specifying his password, of course read only. There might be discussions whether his data might be exported as snapshot in any case (which I would favour), as he need to specify his login credentials on the application/webpage nevertheless, so that part of the anonymizing has gone away - while exporting would remove his real login name, so it will be only statistics).

For scrobbling a new table would be necessary
DateTime (date and time for the scrobbling of the given title)
RecordingID (recording in the Musicbrainz database, best identified using AcoustID and verified with song tags. Of course if an application can only provide metadata/tags of a song, because e.g. the (embedded) platform is not able to create fingerprints fast enough, it should be possible to match the metadata/tags to the Musicbrainz database using that information alone).
UserID (for combining the user with the scrobbling).

And I am not sure whether there is already a rating table for the user (you mentioned one), but it should look like this:
UserID
Rating (from 1-5)
RecordingID  (as the user should rate a specific recording of a song)

These information should already be enough to implement scrobbling and rating (or am I missing something here?).
All these information should be in the core database (of course not LoginName and LoginPasswordHash, so that the data may be used anonymized by applications when parsing the data heap), regarding the usage license.

Extending the API would be another topic, of course...
And if thinking further, possibilities to automatically share the users scrobblings on Twitter/Facebook/etc. should also be a thought for future development.
One thing that I want as feature 0 is the ability to export thair data and import in thair own instance so if and when my service shuts down everyone can run the software on thair own hardware or go to someone else that hosts it.
That is also very important to me, therefore I would like to store the data in the open MusicBrainz database where a snapshot is always available (and from which the user data can be exported using the (anonymous) hashed user login information). But automatic matching the other services data for a given user would be very difficult and possibly not allowed (at least with last.fm: "You must not sub-license the Last.fm Data to others").

Please let me know your thoughts on my thoughts ;-)

Robert: No problem, I will be glad for your answer whenever it is possible to you...

Best regards,

Andre






On Mon, May 4, 2015 at 8:14 PM, Andre Wiethoff <[hidden email]> wrote:
Hello everybody,

I am Andre Wiethoff, the person behind Exact Audio Copy and Easy Audio
Copy. My interest is mostly in audio and nowadays I also do quite some
research on personalized music recommender systems.

I came to the conclusion that only collaborative filtering will help
most for producing good recommendations (for the time being). Even
though AcousticBrainz produces impressive results, it is not yet up to
be used exclusively for choosing a good playlist for the user. In my
opinion, perhaps we will have in 10 years the analysis algorithms that
are capable of producing results that are exact enough for productive use.

It would create great value for the user if an application would be able
to create a playlist for the given user depending on his preferences.
Not only for streaming, but also using his local music collection. For
now, the necessary data are all closed source, available only to the
owning companies (and sometimes available as an API with big
restrictions what to do with the data - some algorithms couldn't even be
run if the database is not available completely, e.g. doing
collaborative filtering with self-created algorithm is not possible with
any of the APIs (as far as I know)). Users are submitting their
information for free and the companies lock them away - there should be
an open source counterpart to stop this closed information handling.
(And of course such a database would need some lobbying work for the
larger, commercial player companies like Sonos in order to add
submission to their players).

As a possible future extension it would be great if such information
(e.g. scrobbling and, even more important, personal ratings of songs for
each user) could also be stored in MusicBrainz. For that, a unique token
per user need to be created (more or less automatically, should be easy
for the user to create within any end-user application - that should be
unique for all applications that the user uses). Creating such a token
would be quite difficult when trying to do it the easy way (as when such
a token already exists with one application, the other application need
to retrieve that token without creating one first - this will help
moving the personal information to a new player). And it should be easy
enough that commercial player software/hardware would also be willing to
implement it. Perhaps the best idea would be still to use Username/Password?
Further, AcoustIDs should be used to unambiguously identify a specific
song played (other metadata could verify the correctness of the
assignment). Last.fm does scrobbling by metadata only (as far as I
know), which might cause conflicts by different versions of a song (e.g.
radio edit, live, etc.).
Finally, the ratings should reflect various liking levels (I propose -2,
-1, 0, 1, 2 - perhaps best displayed as thumbs up/medium up, etc.), as
the more songs the user is able to rate, the better will be the results
of the recommendation engine. If just offering thumbs up/down, people
will not rate songs that are quite ok (and not rating a song should
provide no (implicit) rating at all, as the user could e.g. let the
songs play unattended).

Of course at the beginning there will be no software which fills the
database with information, but I think there need to be a framework
first before some audio player would integrate the API to provide
information about the songs a user plays (and hopefully likes or dislikes).

Only after the database has been populated quite a bit, the player
applications will receive something back for the submission of
information. I would propose that also a baseline recommendation engine
should be implemented in MusicBrainz, which can be called via API.
Researchers or developers who want to create a better recommendation
engines would be free to work on the full database and implement their
own ideas. Of course creating a baseline recommendation engine is still
a huge project, there are some shortcuts which would help creating
playlists somwhat more easy...

What are your thoughts on this?

Thanks for your time!

Best regards, Andre Wiethoff

PS: I proposed to add AcoustID submission into EAC (submitting the
fingerprints with each extracted CD together with the DiscID) already
quite some time ago, directly to one of the developers (don't know
anymore to whom), but didn't receive any reply. If something like this
is interesting, please contact me (privately?).



_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel



_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

tommycrock


On 6 May 2015 11:08, "Andre Wiethoff" <[hidden email]> wrote:
>
> Hello Daniel,
>
> thanks for your reply and your thoughts!
>
> I haven't found any information/description about the user in the database scheme description on the homepage.
> Of course it would be nice to extend the users that also (quite) anonymous users with a user token (and a password hash) could login and store their information with Musicbrainz.
>
>> Musicbrainz also has a rating system that allows people to submit a number between 1 and 5 for most entries but there is not a lot of people that use this
>
> I also found no information about this on the webpage... I assume that this data is stored in the derived data database?

Hi Andre

Info about the ratings is in https://wiki.musicbrainz.org/MusicBrainz_Database/Schema#.2A_rating_raw_.26_.2A_meta_tables and https://musicbrainz.org/doc/Rating_System
I think the editor tables are undocumented. https://wiki.musicbrainz.org/MusicBrainz_Database/Schema#Undocumented_tables


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Andre Wiethoff
Hello Tom,

thanks for the links! (I didn't think of looking at the wiki...)
I am rather new to the Musicbrainz database structure, so I am thankful for any insights.

It seems that the rating tables are already as needed. There seem to be a small discrepancy between the two pages you mentioned though.
One tells the rating is between 1-5, the other between 0-100 (I assume 0-100 will be used internally?)

I finally found the editor scheme in the CreateTables.sql command on github.

It also contains everything that is needed (I think the password field is only a hash and the name field is unique over the database (case insensitive)?). It might be a good idea to add a field with a hash of the name (e.g. in lower case), in order to export a shortened table with id and hashed name which can be used to find the user statistics anonymously in the downloadable database (even though for downloaded databases already the editor id would suffice). In future extensions it could be a thought to also add an option whether other pages can access the statistics using the username/namehash only.

That means for the start we would need only 0-1 additional column in the editor table and a new table for scrobbling with 3 columns(e.g. editor, recording, date). Everything else could be added later...

I would propose not to import existing scrobblings or ratings, but to start fresh over - hopefully the database populates at a decent speed.
The key factor would be to get some audio player developers (commercial and non-commercial ones) to add support for such a feature...

What do you think?

Best regards,

Andre


Am 06.05.2015 um 12:39 schrieb Tom Crocker:


On 6 May 2015 11:08, "Andre Wiethoff" <[hidden email]> wrote:
>
> Hello Daniel,
>
> thanks for your reply and your thoughts!
>
> I haven't found any information/description about the user in the database scheme description on the homepage.
> Of course it would be nice to extend the users that also (quite) anonymous users with a user token (and a password hash) could login and store their information with Musicbrainz.
>
>> Musicbrainz also has a rating system that allows people to submit a number between 1 and 5 for most entries but there is not a lot of people that use this
>
> I also found no information about this on the webpage... I assume that this data is stored in the derived data database?

Hi Andre

Info about the ratings is in https://wiki.musicbrainz.org/MusicBrainz_Database/Schema#.2A_rating_raw_.26_.2A_meta_tables and https://musicbrainz.org/doc/Rating_System
I think the editor tables are undocumented. https://wiki.musicbrainz.org/MusicBrainz_Database/Schema#Undocumented_tables



_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

tommycrock

Hi Andre
On 6 May 2015 13:37, "Andre Wiethoff" <[hidden email]> wrote:
>
> Hello Tom,
>
> thanks for the links! (I didn't think of looking at the wiki...)

The pages on the main site often come from the wiki but a "transclusion editor" needs to update which version appears.

> I am rather new to the Musicbrainz database structure, so I am thankful for any insights.
>
> It seems that the rating tables are already as needed. There seem to be a small discrepancy between the two pages you mentioned though.
> One tells the rating is between 1-5, the other between 0-100 (I assume 0-100 will be used internally?)

I think I remember reading that they are 0-100 everywhere. I'm sure someone who knows will say.
I don't know about the rest so I'll leave it for others to answer.

P.S. Thanks for EAC :)


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Ian McEwen
On Thu, May 07, 2015 at 07:05:35AM +0100, Tom Crocker wrote:

> Hi Andre
> On 6 May 2015 13:37, "Andre Wiethoff" <[hidden email]> wrote:
> >
> > Hello Tom,
> >
> > thanks for the links! (I didn't think of looking at the wiki...)
>
> The pages on the main site often come from the wiki but a "transclusion
> editor" needs to update which version appears.
>
> > I am rather new to the Musicbrainz database structure, so I am thankful
> for any insights.
> >
> > It seems that the rating tables are already as needed. There seem to be a
> small discrepancy between the two pages you mentioned though.
> > One tells the rating is between 1-5, the other between 0-100 (I assume
> 0-100 will be used internally?)
>
> I think I remember reading that they are 0-100 everywhere. I'm sure someone
> who knows will say.
> I don't know about the rest so I'll leave it for others to answer.
>
They're 0-100 in the database, but some ways of looking at the data do
still return 1-5 values, somewhat confusingly. The /ws/1/rating
endpoints seem to do 1-5, where the /ws/2/rating ones do 0-100 (which is
perhaps reasonable, since that was a major upgrade). However, adding
?inc=user-ratings in the WS also returns 1-5, and the site interface
only allows setting multiples of 20 (i.e. 1-5 stars), though it'll still
display the other values.

So in principle they're 0-100 everywhere, but in practice there's places
that are basically only 1-5 stars. YMMV, etc.

We discussed some of the particularities and peculiarities of this in a
thread starting at
http://lists.musicbrainz.org/pipermail/musicbrainz-devel/2015-February/005933.html
(though the start of the thread was for another issue)

> P.S. Thanks for EAC :)

> _______________________________________________
> MusicBrainz-devel mailing list
> [hidden email]
> http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel

attachment0 (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Andre Wiethoff
Hello Ian,

>>> One tells the rating is between 1-5, the other between 0-100 (I assume
>> 0-100 will be used internally?)
>>
>> I think I remember reading that they are 0-100 everywhere. I'm sure someone
>> who knows will say.
>> I don't know about the rest so I'll leave it for others to answer.
>>
> They're 0-100 in the database, but some ways of looking at the data do
> still return 1-5 values, somewhat confusingly. The /ws/1/rating
> endpoints seem to do 1-5, where the /ws/2/rating ones do 0-100 (which is
> perhaps reasonable, since that was a major upgrade). However, adding
> ?inc=user-ratings in the WS also returns 1-5, and the site interface
> only allows setting multiples of 20 (i.e. 1-5 stars), though it'll still
> display the other values.
>
> So in principle they're 0-100 everywhere, but in practice there's places
> that are basically only 1-5 stars. YMMV, etc.
>
> We discussed some of the particularities and peculiarities of this in a
> thread starting at
> http://lists.musicbrainz.org/pipermail/musicbrainz-devel/2015-February/005933.html
> (though the start of the thread was for another issue)
>
thanks for the information!

I have read through it and I even analysed the tables in a current
snapshot of the Musicbrainz DB.
I found the following:

All tables called "_rating_raw" are empty!
But on the other hand, the "_meta" tables does contain information!!

artist_meta: 25215 ratings
recording_meta: 219029 ratings
work_meta: 1419 ratings
label_meta: 602 ratings
release_group_meta: 65150 ratings

Most probably the "_rating_raw" tables are aggregated and then deleted
afterwards?
I assume that the exact editor/rating values are lost forever? This
would really a pity to loose so many recording rating information
(groupable to editors/users)...

Would it be possible to not delete the rating information in the future?

Best regards,

Andre

(just as a sidenote: and what I also found was that the "_meta" tables a
contain additionally a lot of rows, which only references the object,
but have no values for "rating" and "rating_count"... E.g. the
"recording_meta" table has in total 14.791.695 entries, of which
14.608.329 actually doesn't contain a rating - it seems that every
recording has an entry in recording meta automatically...)



_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

tommycrock


On 11 May 2015 at 11:08, Andre Wiethoff <[hidden email]> wrote:

I have read through it and I even analysed the tables in a current
snapshot of the Musicbrainz DB.
I found the following:

All tables called "_rating_raw" are empty!
But on the other hand, the "_meta" tables does contain information!!

artist_meta: 25215 ratings
recording_meta: 219029 ratings
work_meta: 1419 ratings
label_meta: 602 ratings
release_group_meta: 65150 ratings

Most probably the "_rating_raw" tables are aggregated and then deleted
afterwards?
I assume that the exact editor/rating values are lost forever? This
would really a pity to loose so many recording rating information
(groupable to editors/users)...

Would it be possible to not delete the rating information in the future? 

Editors can get / set their individual ratings on the site and (I think) through the webservice. So they aren't 'lost'.
As I understand it, user data like this gets removed from the downloadable database. I think it's seen as a privacy issue.
 


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Andre Wiethoff
Hello Tom,

thanks for your answer!

All tables called "_rating_raw" are empty!
But on the other hand, the "_meta" tables does contain information!!
Most probably the "_rating_raw" tables are aggregated and then deleted
afterwards?
I assume that the exact editor/rating values are lost forever? This
would really a pity to loose so many recording rating information
(groupable to editors/users)...

Would it be possible to not delete the rating information in the future? 

Editors can get / set their individual ratings on the site and (I think) through the webservice. So they aren't 'lost'.
As I understand it, user data like this gets removed from the downloadable database. I think it's seen as a privacy issue.
This does make sense, as the editor table is also stripped of email addresses or passwords...

So the question is, would it be possible to receive a kind of anonymized rating information (e.g. with some GUID instead of the actual editor id, which could not be tracked or even changes everytime the DB is exported)?

Maybe for that a good idea would be to have a privacy option, which would allow one of the three options:
1) Export of all editor related information (e.g. for other player software, etc.)
2) Export of anonymized information (e.g. using an editor ID which actually doesn't exists)
3) No export of any editor related information allowed
(And I would tend to a default of 2) for existing users, but this would need intensive discussions I think...)

How does this sound?

Best regards,

Andre


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Ian McEwen
In reply to this post by Andre Wiethoff
On Mon, May 11, 2015 at 12:08:32PM +0200, Andre Wiethoff wrote:

> Hello Ian,
>
> >>> One tells the rating is between 1-5, the other between 0-100 (I assume
> >> 0-100 will be used internally?)
> >>
> >> I think I remember reading that they are 0-100 everywhere. I'm sure someone
> >> who knows will say.
> >> I don't know about the rest so I'll leave it for others to answer.
> >>
> > They're 0-100 in the database, but some ways of looking at the data do
> > still return 1-5 values, somewhat confusingly. The /ws/1/rating
> > endpoints seem to do 1-5, where the /ws/2/rating ones do 0-100 (which is
> > perhaps reasonable, since that was a major upgrade). However, adding
> > ?inc=user-ratings in the WS also returns 1-5, and the site interface
> > only allows setting multiples of 20 (i.e. 1-5 stars), though it'll still
> > display the other values.
> >
> > So in principle they're 0-100 everywhere, but in practice there's places
> > that are basically only 1-5 stars. YMMV, etc.
> >
> > We discussed some of the particularities and peculiarities of this in a
> > thread starting at
> > http://lists.musicbrainz.org/pipermail/musicbrainz-devel/2015-February/005933.html
> > (though the start of the thread was for another issue)
> >
> thanks for the information!
>
> I have read through it and I even analysed the tables in a current
> snapshot of the Musicbrainz DB.
> I found the following:
>
> All tables called "_rating_raw" are empty!
> But on the other hand, the "_meta" tables does contain information!!
>
> artist_meta: 25215 ratings
> recording_meta: 219029 ratings
> work_meta: 1419 ratings
> label_meta: 602 ratings
> release_group_meta: 65150 ratings
>
> Most probably the "_rating_raw" tables are aggregated and then deleted
> afterwards?
> I assume that the exact editor/rating values are lost forever? This
> would really a pity to loose so many recording rating information
> (groupable to editors/users)...
>
> Would it be possible to not delete the rating information in the future?
>
The _rating_raw tables are the raw ratings, and they correspond to the
private data of exactly which users submitted exactly which ratings. The
_meta tables have the aggregate ratings.

In the real database the _rating_raw tables aren't empty, but the
downloadable snapshots don't include data for those tables, since by
default the exact pairings are private, at the discretion of the editor
in question. So none of that information is absent in the actual
database, it's just not released publicly.

Similarly, collections (even public ones), tags (other than aggregated
ones) and subscriptions aren't in public dumps.

There's a ticket for releasing files that contain collections,
unaggregated tags and ratings, and subscriptions for editors who have
indicated in their user preferences that they can be made public:
http://tickets.musicbrainz.org/browse/MBS-7560 -- it hasn't seen much
progress yet though. Always too much to do :)

> Best regards,
>
> Andre
>
> (just as a sidenote: and what I also found was that the "_meta" tables a
> contain additionally a lot of rows, which only references the object,
> but have no values for "rating" and "rating_count"... E.g. the
> "recording_meta" table has in total 14.791.695 entries, of which
> 14.608.329 actually doesn't contain a rating - it seems that every
> recording has an entry in recording meta automatically...)
>
Yes, these rows are created automatically. While for the recording table
this table only contains rating information, for other things (releases
and release groups) it contains other information that's much less
rarely empty/missing, so it's much easier to just create these rows
automatically with a trigger so that we can count on the rows always
existing, so we don't have to deal with upserts, outer joins, etc.

>
>
> _______________________________________________
> MusicBrainz-devel mailing list
> [hidden email]
> http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel

_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel

attachment0 (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Possible Future Extensions

Ian McEwen
In reply to this post by tommycrock
On Mon, May 11, 2015 at 11:16:12AM +0100, Tom Crocker wrote:

> On 11 May 2015 at 11:08, Andre Wiethoff <[hidden email]>
> wrote:
>
> >
> > I have read through it and I even analysed the tables in a current
> > snapshot of the Musicbrainz DB.
> > I found the following:
> >
> > All tables called "_rating_raw" are empty!
> > But on the other hand, the "_meta" tables does contain information!!
> >
> > artist_meta: 25215 ratings
> > recording_meta: 219029 ratings
> > work_meta: 1419 ratings
> > label_meta: 602 ratings
> > release_group_meta: 65150 ratings
> >
> > Most probably the "_rating_raw" tables are aggregated and then deleted
> > afterwards?
> > I assume that the exact editor/rating values are lost forever? This
> > would really a pity to loose so many recording rating information
> > (groupable to editors/users)...
> >
> > Would it be possible to not delete the rating information in the future?
>
>
> Editors can get / set their individual ratings on the site and (I think)
> through the webservice. So they aren't 'lost'.
> As I understand it, user data like this gets removed from the downloadable
> database. I think it's seen as a privacy issue.
They can be gotten for individual things in the webservice with
inc=user-ratings (and user-tags), though there isn't a bulk export. See
http://tickets.musicbrainz.org/browse/MBS-4948 for that.

> _______________________________________________
> MusicBrainz-devel mailing list
> [hidden email]
> http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel


_______________________________________________
MusicBrainz-devel mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel

attachment0 (188 bytes) Download Attachment