Talk:ABCDEF

30.01.2006: User:Max Völkel
> I spoke briefly with John Domingue a few weeks ago, and he said he was > interested in using the ABCDEF for the entire conference, as long as it > complies to Springer LaTex formats he is interested. Is it possible for your > to contact him directly? Otherwise I'm happy to intermediate.

Issue: The LaTeX template has an abstract, you don't want to have one. So I see no way to have Springer LaTeX and at the same time ABCDEF.
 * Idea: Use a tweaked latex template that is transformed to springer (extracting the abstract sentences and copying them into the abstract section)
 * Con: people would have to learn how to use the different template
 * Idea: ignore the abstarct in latex
 * Con: don't throw information away

> - give the three "core" sentences as an abstract to the contribution, > allowing access to the appropriate section. This implies rendering all content on a web page - i don't know how to do this (looking readable). This means, we must understand latex and render it to html - not easy.

> Note that there is NO abstract, that is a critical aspect of the ABCDEF, > just a set of sentences that summarise the > background/Contribution/Discussion, that together forma summary.

> Also, > there is still a small error int he picture that is up ont he WIKI - the > Ä"does not stand for Authro but for Annotation, this is Author/Title/Bibref > etc. - basically coudl be Dublin Core (without the summary). So if A == DC > that would be good I think (which might make the entire format a simple > extension of DC??) Re-using dublin core is perfect.

entities not attributes?
Yes Eyal, this is fantastic - one small question, why did you change the term "entity" to "attribute" in the picture? Entity could also be areference - plus it spoils the alfabetic perfection :)!

Also: should we have an ABCDEFG mailing list?

--145.36.10.111 13:54, 11 November 2005 (CET)Anita

entities
I didn't make the picture, Heiko was friendly enough to do so. Heiko, could you change the 'attributes' surrounding a paper with 'entities'; you could cloud them per group, such as 'people', 'references', 'projects', to show what we mean with 'entities'.

pictures and updates
Great work guys! I updated the example to reflect Heiko's pictures - but hte text is still not updated, I'll email about that... and moved the location of the images around so the flow is better (I think, if not please move around again!). Also I posted some photos of our dinner at Flickr --145.36.10.111 15:11, 11 November 2005 (CET) Anita

Is LATEX the only input format?
Although interesting, it seems to me that one should allow other input formats as well. A very large portion of authors (if not the majority) do NOT use LATEX... For instance, I believe a similar approach can be applied to RTF, especially if the authors use styles as required in some publisher formats (e.g., Springer). In addition, there should be an XHTML alternative as well. IW3C2 is considering one such format for the online proceedings of WWW conferences.

Daniel

Let's do Word as well?
Hi Daniel, thank you for contributing - we would like todo Word as well, perhasp just a template we can transform? What does everyone else think?

(RTF is the tagged word format, not unlike Latex in many respects) - Daniel

PS I changed the "A" to be for "Annotation"- which is what it was originally intended for, anyway. The difference between entities and annotations is that annotations refer to the entire contribution (so are title, authors, etc.) whereas entities link to a specific item or object in the text. OK? Anita

Example
Hi, here is a self-descriptive example of ABCDEF - welcome for editing/comments! Thiis is just an html mockup - my colleague Simon Pepping is making a Relax-ng schema for it.

(Annotation: dc:Title The ABCDE Format: Publishing Semantic Conference Papers dc:Creator.PersonalName Anita de Waard dc:Creator.PersonalName.Address anita@cs.uu.nl dc:Creator.2 Simon Pepping dc:Creator.Address.2 s.pepping@elsevier.com dc:Subject *** I.7.: Editing, Text dc:Subject Semantic Web dc:Subject Wiki's dc:Type Text.Proceedings dc:Identifier http://www.semwiki.org/2006/ dc:Identifier URN dewaardsemwiki2006 dc:Language ISO639-1 en dc:Date.X-MetadataLastModified ISO8601 2006-02-02 /Annotation)

The ABCDE Format: Publishing Semantic Conference Papers
* We believe that the best way to present a narrative to a computer is to let the author explicitly create a rich semantic structure for the article during writing. * As conceptual structures become the central bearer of information, a set of structured documents can be integrated to form a ‘knowledge network’, or structured package of related knowledge regarding a topic. * We propose an open-standard, widely (re)useable format, the ABCDE Format (ABCDEF) for proceedings and workshops contributions that can be easily mined, integrated and consumed by semantic browsers and wikis. * There is no abstract in an ABCDE document - instead, within the B,C and D paragraphs the author denotes 'core' sentences. Upon retrieval or rendering of the article, these can be extracted to form a structured abstract of the article - where one can jump directly to the core of the Background, Contribution or Discussion. * We aim to work on different incarnations of this format and open it up to modification and development.

Background
(Background:

(Entity: Object (type: text) = (Background paragraph), relation = Footnote, Subject (type: text) = "This background is a copy of the Introduction paragraph of (Entity: Object (type: text) = (this contribution), relation = Reference, Subject (type: URI) = http://labs.elsevier.com/resources/adw/papers/SWDaysDeWaard1209.pdf)). It is an essential property of semantic conference contributions that they can be composed in a modular format, i.e. linking to or reusing parts of existing documents.)

“There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers - conclusions which he cannot find time to grasp, much less to remember, as they appear.“ (Bush, 1945)

Scientists are increasingly unable to process the ever-increasing flood of scientific literature that surrounds them. Biomedical literature, for instance, grows by over 500,000 publications each year (Cohen, 2005). In a recent study on user needs among British archaeologists, 71% of the respondents felt that information was produced of which they were unaware (Jones, 2001). Next to problems in accessing one’s own field, it becomes more and more difficult to access adjacent domains of science. Furthermore, scientists do not only want to know what publications contain specific words, and how to rank them by relevance, but what knowledge is contained within the papers, and how it relates to their existing knowledge. For example, cell biologists might want to know: “What functions of this gene are known?” Astronomers might ask “What radiation patterns have we seen in red-dwarf stars?” or “What theories does this new observation support?” Ideally, a new publication should situate itself within the existing knowledge context of the reader, and show how it affects or alters this context.

There have been many efforts to combat information overload in science. Abstracts have been developed in the sixties and seventies. Although they are shorter to read, abstracts do not provide a full summary of the work described in the document, nor do they offer any way to integrate the document into the existing knowledge. Metadata is a broad term covering many different types of information, but generally includes the bibliographic reference to a document, and descriptors such as keywords. Metadata helps retrieve an article when descriptive elements (author, title) are known. The main function of a keyword list is to classify the article in a category. But neither provides any direct insight in the knowledge conveyed within the body of a scientific paper.

Text mining and information extraction are methods specifically developed to find relevant information in unstructured texts and encode the information in a structured form, like a database record (Couto, 2003). In theory, text mining is the perfect solution to transforming factual knowledge from publications into database entries. However, automatically identifying concepts such as genes and proteins poses many problems, see e.g. Mons (2005) and Cohen (2005). Moreover, computational linguists have not yet developed tools that can analyse more than 30% of English sentences correctly and transform them into a structured formal representation. For this, the papers still need to be handled by a curator (Rebholz-Schuhmann, 2005).

The main problem with automatically extracting information from scientific articles is that the genre of the scientific publication has developed to be an indivisible information unit (see e.g. Bazerman (1998)). The scientific paper is a self-contained narrative, created anew in each iteration, with specific genre characteristics that minimize the potential of identification, content reuse and knowledge integration. All this rhetorical freedom comes at the expense of usability in a computer-centered environment. The linear narrative was fine when we still read and wrote on paper, but the changing (digital) environment in which scientists live and work calls for a changing fundamental unit of communication. (Core1: We believe that the best way to present a narrative to a computer is to let the author explicitly create a rich semantic structure for the article during writing /Core1) (see also de Waard, 2005). At a high level, this structure will consist of self-contained modular elements or entities, and discourse relationships between such elements (within a text, and between texts). The tension between these self-contained ‘knowledge elements’ or conceptual structures, and the meaning conveyed in the conventional narrative of the document as a whole, poses an interesting topic of study in terms of both knowledge modeling and rhetoric/discourse studies. (Core2: As conceptual structures become the central bearer of information, a set of structured documents can be integrated to form a ‘knowledge network’, or structured package of related knowledge regarding a topic. /Core2) This can be envisaged (and modeled) as a network of nodes and relationships, and can be seen to form an incarnation of the ‘intelligent data’ ideal, which th e Semantic Web is meant to enable (Berners-Lee, 2001). The purpose of this project is to examine such a new form of structuring, and the authoring, editing and retrieval processes needed to use it. Specifically, we are interested in representing conferenc eproceedings in a new way. Semantic Browsers such as PiggyBank [ ] and semantic collaborative authoring tools such as Semantic Wiki [ ] are paving the road for distributed, semantic communities to communicate. /Background)

Contribution
(Contribution:(Core3: We propose an open-standard, widely (re)useable format, the ABCDE Format (ABCDEF) for proceedings and workshops contributions that can be easily mined, integrated and consumed by semantic browsers and wikis. /Core3) This format can be created in several data types: LaTeX, xml, as a Microsoft Word template or a simple text file. It is characterised by the following elements:

A - Annotation. Each record contains a set of metadata that follows the Dublin Core standard. Minimal required fields are Title, Creator, Identifier and Date.

B, C, D - Background, Contirbution, Discussion. The main body of text consists of three sections:

* Background, describing the positioning of the research,ongoing issues and the central research question; * Contribution, describing the work the authors have done: any concrete things created, programmed, or investigated; * Discussion, contains a discussion of the work done, comparison with other work, and implications and next steps.

These section headings need to exist somewhere in the metadata of the article - but they can be hidden markup, Also, each of the sections can have different, and differently named, subheadings.

E- Entities. Throughout the text, entities such as references, personal names, project websites, etc. are identified by:

* The text linking to an entity (and/or it's URI, e.g. in XPath) * The type of link (reference, footnote, website, etc.) * The linking URI, if present * The text for the link

In other words, the entity link can be described as an RDF statement.

(Core4:/ There is no abstract in an ABCDE document - instead, within the B,C and D paragraphs the author denotes 'core' sentences. Upon retrival or rendering of the article, these can be extracted to form a structured abstract of the article - where one can jump directly to the core of the Background, Contribution or Discussion. /Core4)  /Contribution)

Discussion
(Discussion: (Core5:/ We aim to work on different incarnations of this format and open it up to modification and development. /Core5) The point is to offer a flexible structure that can live on semantic environments such as Semantic Wikis (SemWeb, OntoWeb) and browsers (such as Haystack or Piggybank). The aim is by adding markup, that discovery and integration of information is enhanced, by and for the semantic web community. An example of possible developments would include the creation of a conference program, consisting of "core-contribution"sentences, that link to contributions, as a quick way to scroll around the papers presented. Another example would be to mine all the links to a project website and connecting them to the website, linked to the paragraph in the contribution where the project was mentioned.

/Discussion)--Anita 15:50, 14 February 2006 (CET)