Semantic MediaWiki development activities

On this page, developers of Semantic MediaWiki and related efforts document their current activities. It helps to improve coordination among project members, and it also serves as a changelog. Finished tasks should be explained as features in the online help documentation for Semantic MediaWiki.

Features that are marked here as implemented may or may not be present in the version of the SMW code running on this wiki; the latest SMW code is available from the Subversion repository (but may not work!).

Tasks on this page should be edited only by the respective developers.

Note that you can easily link to bugs on SourceForge and MediaZilla by using the interwiki-prefixes smwbug: and mediazilla:, or Template:SMWbug.

New developers should observe the style guidelines for their code. Comments in code are extracted by Doxygen to create online documentation (click Files and scroll down to extensions/SemanticMediaWiki.

Version 0.7
'''The below list is outdated. See current activities above.'''

Hierarchies of relations and attributes are obviously needed. The main challenge is to support them in inline queries, but also a well-formed RDF-export is a requirement.

Searching and querying
Users write queries in article source, and results are shown in the article. Templates can now be used for constructing queries, as well as for printing query results (format="template").

Two functions are way too slow right now: the materialisation of the category hierarchy and the identification of articles with mutual redirects. Those need simplification and reimplementation.

In addition to the current Timeline format, which shows one (event) or two (duration) dates for each returned article, there will be a new format "eventline" that shows all dates selected in the query. Color-coding and pre/appended article names enable users to tell which date came from which query result. In this way, many dates from many articles can be displayed.

A new format enables embedding article texts of articles retrieved with an inline query.

Print the number of results instead of the results themselves.

Try to eliminate all FIXMEs.

The current simple semantic search is outdated. Required features are: support for many results split over many result pages, improved UI, cleaned up implementation (which is not efficient right now), facetted browsing features (e.g. results could provide further quicklinks).

External services and reuse
Rewrite good portions of the RDF export to become more performant and easier to maintain.

It should be possible to use a Special similar to Special:Ask for retrieving RDF selectively.

The wiki currently does not prevent "meta-modelling" in the sense that cateories and other annotations can again be annotated. To enable compatibility with OWL-based tools, it should be possible to constrain output to be in OWL, even if the wiki is not. This is currently done by just dropping offensive annotations, but a cooler way would be to create new annotation URIs and to describe those as AnnotationProperties. This will be implemented as follows:
 * normal properties on normal articles are exported as usual
 * properties can be declared as annotation properties in the first place (see below), and in this case they are always exported literally (as everything is now)
 * non-annotation properties that involve TBox-elements are interpreted as annotation properties and a new URI is created for this purpose. The annotation property's description cannot directly be accessed in the wiki and contains only a label and a link to its original non-annotation property URI.

Redirects are not considered in the RDF export right now. They should be owl:sameAs statements for articles, owl:equivalentProperty for attributes and relations and owl:equivalentClass for classes.

The import of outside ontologies is possible. It adds missing statements from an ontology to the wiki.

There's a bug in the upload, that in more complex ontologies lists an entity several times.

The OI code is messy and has various problems and limitations. It should either be cleaned up, or our current solution of using Python-based refactoring should be polished to be shippable as a maintenance application.

Needed: APIs in a variety of languages to include data from a SMW in your applications. No knowledge of PHP required, just your own native language, be it Python, C/C++, .NET (C#, VB.NET, J#, IronPhython, etc.), Java, or COBOL if you insist :) The language should have an RDF library though, or the task will grow rather big.

Datatype support
The date/time implementation is still insufficient since it handles dates only near present times. It cannot deal with anything before 1901-12-14 or after 2038-01-19.

Note: PHP 5.2 has a DateTime object which is a wrapper around a 64 bit integer. However, it is currently limited to 1AD to 9999 AD.

New tentative plan: do our own date parsing. Instead of converting to a timestamp of seconds since 1970, just convert to a number that provides accurate ordering. Still based around the same epoch so nearby times have precision extending into seconds for sorting.

Open Issue: should Type:Date figure out whether to export as XSD type #date or #datetime based on whether there's a time component?

Note that in addition to historical dates, there is also geologic time scales, e.g. "Tyrannosaurus rex flourished approximately 65 Million years ago". Although in theory you can represent this as XSD #date of -65000000, we will not attempt to handle this with Type:Date, instead just use an Attribute:Geologic time using the custom float unit Type:Time.

PHP understands timezone identifiers, but if no timezone is given, a default timezone of the wiki should be applied. Furthermore, it should be explicitly stated what timezone some time refers to in the infobox. (However, this doesn't make sense for historical dates. Maybe only support timezone for dates with times?)

Sorting dates. Dates don't reliably sort and queries don't always work. See Type:Date

Permit specifying a format for date/time values in inline queries, to, e.g. just show the year or the day and month for birthdays.

Type:Boolean page should document that using a Category may be better.

See alternative proposed in "enum attribute type" thread on Semediawiki-user.

The OWL ontology language distinguishes annotation properties (which could be compared to comments) with the more semantic datatype and object properties. To enable meta-modelling, SMW should allow users to declare some properties as annotation properties (see plans on RDF export for details on how this affects RDF). The plan is to allow properties to have additional statements of the form has Type::Type:Annotation which are then separated from normal type statements during parsing and saving. The advantage is that we can reuse "has type" instead of making something new, and that we have a link to a possibly enlightening page "Type:Annotation". The disadvantage is the additional mix-up of types and other properties. A similar scheme could be used for future features like symmetry or transitivity.

This feature would make the current Type:AnnoURL obsolete, but would require the vocabulary import features to have markers for annotation properties.

We were repeatedly asked to support text attributes longer than 255 characters. Since the current SMW database tables do not have a space for such content, an additional table would be possible. The attribute table could use the value and unit fields for a hash and a key of the original text. Computing the hash in a datatype handler before searching would then enable querying to quite some extent. Using the unit field for a key might be less clean ... Finally, one must be possibly careful that the long-text data is not overly large either.

Interface improvements
The Special:Types became rather useless with the advent of custom types. It needs to be able to display custom and builtin types, and to show their supported units, if any.

When a result table is sorted by a date or numeric column, most of the time it gets sorted by its lexicographic order, not by the numerical one. This is a weakness in SMWIP/skins/SMW_sorttable.js. It would be useful if the sorting script could refer to some (invisible) HTML-parameter to sort columns instead of using the value string (which is of unknown form and might be very complicated to parse). Just a heads up: it looks like base MediaWiki will soon have similar table sort code in it ( is fixed). -- Skierpage 00:46, 16 December 2006 (CET)
 * inline queries could return the value_num for any attribute that isNumeric, hide this somewhere in the HTML, and tweak sorttable.js to check for this in its sort function.

PROTOCOL -AJAX library choice was made (www.script.aculo.us) -Implementation of one example. (Suggestions were made by array) -Searching for a solution in interface among AJAX and MYSQL -new try with SAJAX-library (coded in PHP) and class.inputfilter.php5 -an example in SMW_Special implemented making DB queries -Implementation to the actual version of SMW 0.4 .Current aim is to implement Autocomplete-example easier and more beautiful coded. I've found a problem with the IE Browser, which I have to fix now!

Bugfixes/Cleanup
Much of the functionality of SMW is supported by wiki pages, e.g. help pages, standard user-defined types with units for area, length, time, etc., the  coordinates infolink services, and so on) that we should consider providing dumps of these pages as part of the release to help set up systems.

PHP5 sufficiently supports keywords like protected, private, static, and interface. They should be used eagerly.

The JavaScript and the process of inserting it into a page is not optimal at the moment. The replacement process needs a second parse that in fact breaks most the things one could want to write within a tooltip. Especially it breaks when errors are reported in tooltips (since this requires a span-tag).

Changes in older versions

 * Semantic MediaWiki 0.6 changes
 * Semantic MediaWiki 0.5 changes
 * Semantic MediaWiki 0.4 changes
 * Semantic MediaWiki 0.3 changes