Ranking DDI 3 metadata items in search results

We were recently at UW-Madison to visit with Dr. Barry T Radler and the MIDUS longitudinal survey. We are working with them to document their series of longitudinal studies in DDI 3 to enable the generation of very detailed and cross linked codebooks. During our meeting Dr. Radler mentioned that the way the search results on Colectica Web were listed could be improved, as they were currently ordered based solely on information specific to each individual item.

Colectica Web offers faceted searching of DDI items, and even searching within arbitrary sets such as a specific study, instance, package, scheme, etc. The question is how to return the results with the most relevant DDI 3 item listed first. DDI 3 allows for massive reuse of items through its referencing mechanisms. For example, the same concept can be used to describe multiple questions or the same code scheme can be the representation of many different variables. Colectica Repository tracks all of this extra relationship and contextual information about DDI items, so we decided to use it in the search rankings.

Introducing DDI 3 metadata ranking

The search results, show for Gender above, now use not only the information from the DDI 3 item for ranking, they also take into account the how often an item is reused and harmonized across waves of this longitudinal study.

This is also a great new feature for users of Colectica Designer. When a user opens the item picker to create a reference, their search results will also list the most reused items more prominently. This will help users find the items that already have the most influence and increase the comparability of their published research. Please let me know what you think of the new search rankings or if you have any ideas for how they can be even further refined.

Posted in Colectica | Tagged , | Comments

WPF DataGrid and the Backspace Character

Colectica uses the DataGrid throughout its user interface for displaying editable lists. Today, while editing some metadata, I wanted to remove the text in a particular column for many rows. This is pretty quick if you get into the rhythm of F2-Backspace-Enter, F2-Backspace-Enter, F2-Backspace-Enter….

Things don’t go quite so well if you miss the F2 part of the pattern and just press Backspace-Enter on a cell. The WPF DataGrid will actually replace the contents of the cell with the backspace character. Depending on how you look at your string, this might show up as 0x08, , u0008, or b.

This CodePlex post confirms the bug in the DataGrid and includes some workarounds. On our end, the current fix is simply to ignore strings that have a backspace character in them. This way they don’t end up in the XML, which is good because in XML 1.0 the backspace is an illegal character.

Posted in .net | Tagged , , | Comments

How to register a 11179 item in Colectica Repository

Colectica Repository can store any metadata items that conform to the ISO 11179 naming scheme for registered items. The DDI 3 Addin for Colectica Repository additionally allows for indexing of contextual and relationship information. Here is a brief code example showing how DDI 3 items can be registered in the Colectica Repository.

First we will create some DDI 3 based metadata using the Colectica SDK. If you don’t have the SDK you can create the DDI 3 by hand or using your favorite XML library.

// Create a DDI 3 Concept using the Colectica SDK
Concept concept = new Concept() { AgencyId = "example.org" };
concept.ItemName["en-US"] = "Given Name";
concept.Description["en-US"] = @"A character-string (e.g. `Billy' and `Peter')
        given to people as a first name (or, in most Western countries, as a
        middle name), usually shortly after birth.";

// Create a DDI 3 Question using the Colectica SDK
Question q1 = new Question() { AgencyId = "example.org" };
q1.QuestionText["en-US"] = "What is your first name?";
TextDomain domain = new TextDomain();
domain.Label["en-US"] = "First Name";
q1.ResponseDomains.Add(domain);

// Link the question and concept
q1.Concepts.Add(concept);

Then we will create the repository client, using the supplied credentials.

// Create the web services client
var client = new WcfRepositoryClient(
    "username", "password", "localhost", 19893);

We can Register any object made by the Colectica SDK using the built in mappings.

// Register a 11179 administered item using
// the Repository Client helper functions
client.RegisterItem(concept, new CommitOptions());

Alternatively, we can access the web services layer and construct the proper SOAP payload.

// Register a 11179 administered item using the Web Services directly
Collection<Note> notes = new Collection<Note>();
string serialization = q1.GetXmlRepresentation(notes).OuterXml;
RepositoryItem ri = new RepositoryItem()
{
    CompositeId = q1.CompositeId,   // agency, id, and version
    Item = serialization,           // item's serialization
    ItemType = q1.ItemType,         // model defined item type identifier
    IsDepricated = false,
    IsPublished = q1.IsPublished,
    IsProvisional = false,          // only used in the local repository
    Notes = notes,                  // notes about the item being registered
    VersionDate = q1.VersionDate,
    VersionRationale = q1.VersionRationale,
    VersionResponsibility = q1.VersionResponsibility
};
client.RegisterItem(ri, new CommitOptions());

As you can see, we added both the DDI concept and the DDI question item to the repository. The Colectica SDK has methods to gather all items that are linked and create sets of items to be registered. It also has the ability to detect changed items automatically, so a program can quickly determine which items should have new versions registered after a user action.

Update 1: Registering items in a DDI instance

Here is more information about registering items in the Repository based on some followup questions.

When adding a concept to a question, the hierarchical relationship is established. Is this inline, or by reference at the XML level?.

The DDI 3 standard allows for either including items inline or by references in many locations. Colectica Repository will process and store items using either format. If it is a DDI 3 item, the Repository will additionally index the text and relationship information about the item using the DDI 3 Addin. Note that only the item being registered and its relationship are processed, each item must still be registered individually or in a batch operation.

Colectica Designer will always use the referencing mechanisms in DDI when interacting with the Repository. This is for speed of processing and to allow the easiest sharing and harmonization of items across multiple Studies and Instances. You can learn more about how Designer determines item boundaries by reading about Concise Bounded Descriptions.

How can I register all items in a DDI Instance? How can I update only the changed items in a DDI instance?

If you are using DDI with an XML library, you can use the following xpath queries to find all the items in your Instance to register.

//*[@isVersionable]
//*[@isMaintainable]

You can then loop over the XML nodes returned by the XPath and register the results. If you are using the Colectica SDK, you can find all items in an Instance as follows:

// obtain your DDI Instance in some fashion
DdiInstance instance = YourDdiInstance();

// Find all items
ItemGatherer gatherer = new ItemGatherer();
instance.Accept(gatherer);
Collection<IVersionable> allItems = gatherer.Items;

// You can also find only the modified items
DirtyItemGatherer gatherer = new DirtyItemGatherer();
instance.Accept(gatherer);
Collection<IVersionable> changedItems = gatherer.DirtyItems;

How do I export a whole instance to DDI3?

There are several ways to export a DDI instance from the Repository. One way is to use the Repositoy’s command line tools and to write an XML document. Another is to programmatically export a DDI instance using the SDK:

// obtain your DDI Instance in some fashion.
// Only the identification is needed since we will populate the instance
DdiInstance instance = YourDdiInstance();

// populate the entire class hierarchy from the Repository
GraphPopulator populator = new GraphPopulator(client);
instance.Accept(populator);

// Create the XML document for the DDI Instance
DDIWorkflowSerializer serializer = new DDIWorkflowSerializer();
XmlDocument doc = serializer.Serialize(instance);

A third option is to use an XML library and construct the DDI Instance.

Posted in Colectica | Tagged , , | Comments

New Colectica Repository access methods

Colectica Repository is used as both a registry and resolution service for various pieces of identified metadata. Both Colectica Designer and Web communicate with it to perform all of the neat tasks listed on their features pages. Users can communicate with these same service calls to create their own applications and leverage all of this built in functionality. By default, we supply a SOAP 1.2 WS-* and net.tcp endpoints to communicate with the server remotely. These are the industry heavyweights in enterprise SOA architecture.

Recently we had a client request to use the SOAP 1.1 WS-Basic profile. Due to the Repository’s decoupled design, we were able to add this very quickly. All of these endpoints use a secure transport channel such as SSL/TLS. The quickness of adding new access methods got me thinking what other types of endpoints and serializations might be useful. Adding both SPARQL and REST immediately came to mind.

Colectica Repository already has an excellent relationship and set based querying system. Adding a SPARQL endpoint would allow users to use a standardized query language to process those relationships and associated data. The RDF serialization would be a subset of the official DDI object model. When the DDI urn format is agreed upon I will look into this more. If you like this idea, tell us you would like to see feature ticket #1181 implemented.

REST services make it very convenient for users on various platforms to create access clients. Since all metadata stored in the Repository are identified consistently it should be simple to make a basic access model. Exposing some of the Repository’s more advanced functions would be a bit more challenging, but for simple resolution this would work well. REST could also make use of already existing HTTP caching, as published versions of the metadata do not change.

Aside from native DDI 3, JSON is an obvious candidate for the serialization format, but speed is always a concern. I have been looking at several new binary serialization formats:

  • Google Protocol Buffers: Protocol buffers are fast, simple, compact, and cross platform. I have seen benchmarks where they are faster than the net.tcp binary serialization we currently ship.
  • BSON: Binary JSON is another option and is very similar to the protocol buffers, but is not tied to a schema.

I’ve added REST support as feature ticket #1182, again let us know if that interests you. The next version of Colectica Repository now additionally supports SOAP 1.1, are there any other ways that you would like to access the services?

Posted in Colectica | Tagged , | Comments