MetadatasourceId and ContextBasedGenerator
A MetadatasourceId is intended to represent a source of metadata.
More specifically, a MetadatasourceId is something that uniquely identifies
a learning object in a certain context. These contexts can be various,
such as a course in a learning management system, or a file from a file system.
Each MetadatasourceId has a corresponding ContextBasedGenerator.
In fact, the former class is used to create and initialize the correct latter one
(based on the name of the MetadatasourceId), which will then be able to create metadata
for the learning object, as used in that context.
The MetadatasourceIds and the ContextBasedGenerators will vary from server to server.
ServerA will support other MetadatasourceIds than serverB. Therefore we provide
an operation that allows retrieving all supported MetadatasourceIds.
Conflict handling and merging information
The idea behind this specification for automatic metadata generation is
that the final metadata instance will be a collection of parts that are
generated by different metadata generators. Those different generators
can for example be different classes in one application. But it can
also for example be several different web applications, residing on
different servers, that return part of the metadata, that is then
combined e.g. by the client or by another web application.
Because we have different generators that each generate part of the metadata,
these subsets have to be combined into one resulting metadata record for the learning object. Because those subsets can overlap,
there may arise a conflict between the generators, that has to be solved. There are several strategies to solve the conflicts;
depending on the element, one strategy might work better than another.
- One option would be to include all the values in the resulting set.
This is the easiest to implement and might be feasible for some
metadata elements. For example, a list of concepts could contain
all the keywords extracted by several generators. In some
systems, however, the metadata set is strictly defined so we cannot
implement this as an overall strategy for all the elements.
- A second option would be to ask the user how the merging has to happen.
This can be used in a small system with only a low number of new
entries per week or month. In larger systems, however, we would
lose all the benefits of automatic metadata generation as the user has
to spend time on controlling all the values and decide which one to use.
- A third option would be to try and find out which of the generators are
most likely to be correct, and use their value in the result. In
this case, every generated value will get an associated value which is
the degree of certainty of the generator about that value. We call this
value the confidence value in our framework. Every generator
determines such a value for the metadata elements it generates.
In case of conflict, this strategy will prefer a value with a higher
confidence value over one with a lower value.
- A fourth option would be to apply heuristics to decide on the value.
This option applies in certain cases if heuristics are known about the
metadata elements. In that case, the heuristic will provide the
solution about the conflict. An example element for which
heuristics can apply is the document language. A lot of families
of languages exist and in those families the differences between
languages might be very small. For example Italian and Catalan
are closely related to each other but are different languages; the same
applies to Afrikaans and Dutch. If one metadata generator decides
the language is Catalan, the heuristic might say to use Italian.
In either case, if the document is used in an Italian or a Catalan
environment, the users will understand the contents and thus be able to
use the object. Applying Catalan for the document language
however could be more precise but the value Italian is not wrong.
To deal with the previous thoughts, we introduced what we call
"Conflict handling methods/strategies".