Benutzer:Duesentrieb/Semantic Wiki Web

aus Wikipedia, der freien Enzyklopädie
Zur Navigation springen Zur Suche springen

A rough collection of ideas about making Categories work, or what to abandon them for. This is intended to be an elaborate rant or brain storming on the topic - it's what I would if if I was King. Diskussion about the applications of categories should be continued on Wikipedia Diskussion:Kategorien - but if you would like to talk about my ideas presented here, you are welcome to do so on the discussion-page here.

This page is permanently under construction.

  • First of all, I was going to post the conversation from #mediawiki here - but I lost the log (actually, I somehow forgot to set up my X-Chat to log stuff...). So if anyone has a log of #mediawiki from the 10. Aug., around 2:00 AM CEST (0:00 UTC), please tell me or just post it here.
Ok, I got the log now... now I need the permission of the folx involved: Head, Elian, Hemanshu, Datura, Sesse, Brion, Cyrius, TimStarling. So if any you read this, please let me know if you are to me publishing out talk from yesterday here. If you want to reveiw, i'll send it to you so you can check. -- D. Düsentrieb (?!) 21:03, 10. Aug 2004 (CEST)

Core Problems[Bearbeiten | Quelltext bearbeiten]

The core problems I would like to address on this page are, among others:

  • There are different "types" of categories
  • Categories relate in different ways to other categories
  • There are different dimensions over which Articles can be categorized
  • Or, to address all of the above issues, articles can be related in different ways to other articles.

I will talk about those issues in detail later. However, first I would like to mention the core features I think we need most urgently to address our current problems:

  • A naming convention to make clear the dimension (aka. Axis or Facet) a categories is used in, to avoid confusion and mis-classifications.
  • Search over cross-sections of categories, an a sensible syntax for linking to such sections.
  • a concept of implicite membership in categories, such that members of a sub-categorie are automatically members of the parent-categorie(s).
  • A disctinction between a categorie being a subcategory, or just belonging to another category, just as Articles do.

Below, I try to have a more detailed look on the problems and possible solutions.

Facet Classification[Bearbeiten | Quelltext bearbeiten]

This section is about some conventions in the naming and application of categories, intended to provide more structure and clarity. The basic idea ist to classify Articles along distinct dimensions. I would also suggest to add a prefix indicating the axis to each category - here is why:

  • Prefixes indicate what belongs into a category, and what not. Also, it takes care of ambiguities. Karl der Grosse may belong to Topic:Germany, but not really to Place:Germany, etc (also this could be taken care of using different techniques - see the sections some way down this page).
  • Prefixes give the software a way to differentiate between different "types" of categories: This way, it would be easy to create an overview by Article-Kind, by Topic, etc. If a multi-demensional search is implemented, it would then even be possible to offer a (possibly hierarchical) selection box to the user.
  • This would be a primitive way to state the type of a relation - as a first step towards the ontology-model discussed later on.

The following dimensions have emerged from the discussion so far:

  • Kind (or Type or Object): What kind of thing the Article is about. I would suggest an extensible hierarchy of kinds, like this:
  • Kind:Thing
  • Kind:Living Thing
  • Kind:Plant
  • Kind:Animal
  • Kind:Person
  • Kind:Idea
  • Kind:Mathematic Formula
  • Kind:Philosophical Idea
  • Kind:Organisation
  • Kind:NGO
  • Kind:Government
  • Kind:Church
  • ...
  • Topic (or Subject): A field of study or general topic the article belongs to. This, too, should be a hierarchy (or acyclic graph), along academic diciplines and other fields of interest. A single article may be assigned to multiple topics. Sample hierarchy:
  • Topic:Science
  • Topic:Math
  • Topic:Number Theory
  • Topic:Biology
  • Topic:Neurology (first reference!)
  • Topic:Medicine
  • Topic:Neurology (second reference!)
  • Topic:Religion
  • Topic:Okkultism
  • Topic:Christianity
  • Topic:Catholicism
  • Topic:Protestantism
  • Topic:Islam
  • Topic:Art
  • Topic:Music
  • Topic:Poetry
  • ...

It was argued that along this axis, no prefix should be used for the categories.

  • Place (or Space): Where the object is. This should only be applied to things that have a fixed location - other relations to a place should be expressed along a different axis ("Topic" or "Keyword"). An example hierarchy for this axis would be:
  • Place:Universe
  • Place:Milky Way
  • Sol System
  • Place:Mars
  • Place:Earth
  • Place:Europe
  • Place:France
  • Place:America
  • Place:USA
  • Place:Texas
  • ...
  • Time: The time into which an item belongs. This makes sense specifically when talking about events. For other things (like people, songes, etc), the time of existance of popularity could be expressed along this axis. Here, to, an extensible hierarchy would make sense - using the gregorian calendar for the recent times, and the geological or astronomical timescale for longer periods:
  • Time:Hadean
  • ...
  • Time:Pleistocene
  • Time:Holocene
  • Time:10th Millenium BC
  • Time:9th Millenium BC
  • ...
  • Time:1st Millenium AC
  • Time:2nd Millenium AC
  • Time:1st Century
  • ...
  • Time:20th Centurie
  • Time:1900s
  • ...
  • Time:1990s
  • Time:3rd Millenium AC
  • Time:2000s
  • ...
  • Meta (or Wikipedia): This is for categories that define properties of an article not of what the article is about, like so:
  • Wikipedia:Work
  • Wikipedia:Stub
  • Wikipedia:Biased
  • Wikipedia:Dead End
  • Wikipedia:Cryptic
  • Wikipedia:Quality
  • Wikipedia:Excelent
  • Wikipedia:Help
  • ...

Based on this concept, a multi-dimensional search could be implemented, allowing to narrow in using time/space/etc coordinates. See the following section for more Ideas about this.

Sections of Sets and Transitive Closures[Bearbeiten | Quelltext bearbeiten]

Here are some ideas about how to make categories more usable. I think the features suggested here could be easily implemented, and I hope they soon will.

So here is what i think we need:

  • A way to search for cross-sections of categories, and to link to such cross-sections as if they where regular categories. That way, we wouldn't need categories like Politicians in the 20th century or female soccer players: those could be created on the fly from links like this: [[:category:20th century && politician]] resp. [[:category:soccer player && woman]]. The page resulting from such a link should contain, besides the obvious list of members, links to the categories it was composed from.
  • A way to get all members of a category and all of its sub-categories (the implicite members, or formally the transitive cover of a category). That way, a politician would need just the category:politician and not also the category:person, category:mamal, etc... Like above, there should be a way to link to such categories, e.g. like this: [[:category:Person*]] would link to all people, even if they where only contained in a subcategory of person, like politician, author, pope, etc.
  • The above suggestion implies interpreting the sub-category relation as "is subset of". But as we see from experience, we need to be able to include categories in other categories *without* implying a subset. Thus, it would be best to allow categories to be included in another categories by two distinct relations: as a member (just as articles belong to categories), or as a sub-categorie (subset) - the latter could be written as [[part of category:person]]. This would allow us to include Hamburg in the category Places in Germany, and the Hamburger Aalsuppe in Hamburg, without implying that the soup is a place in Germany.

Building an Ontology[Bearbeiten | Quelltext bearbeiten]

This is about making the Wikipedia into a encyclopedia for computers: an Ontology. An ontology is a way to formally express knowledge, in a way that may be used by programs to perfrom deductive reasoning. To me, the idea to make the Wikipedia into a collection of knowledge usable by both humans and computers seems quite natural.

This would allow very precise searching, for example to extract a portion of the wikipedia by time, topic, or some other relation like "all works of Author FooBar".

The basic concept is borrowed from RDF: categories are abandoned completely, instead articles are connected by an arbitrary number of relations, which can be expressed as triplets auf (from/relation-type/to). The relation-types should be user-definable, just as categories are now. They would themselves have a description-page, and (nearly always) relations to other relations. Especially, a relation can be derived from other relations, so that applying one to an article, other relations would be implied.

Here are a few examples of relations that i would like to have:

  • instance of: Like the Kind-Axis above: John F. Kennedy is an instance of Person, which impies that he is also an instance of Living Thing.
  • is component of: Like the hierarchy expressed in the Time and Place Axes above: germany is a component of europe, CPU is a component of Computer, etc.
  • is member of: for instance, Joschka Fischer is member of Government of Germany, etc.
  • is subclass of: especially interesting to relate category-like articles and to allow inference (implicite relations): Pope is subclass of Person, and Person is a subclass of Living Thing, etc.
  • implies: similar (or identical?) to the above, especially useful for relations: is mother of implied is parent of, is parent of implies is relative of, etc.

Those are the basics. But is is important to allow users to define new relations, so that a fine-grained semantic web can be created. Such specific relations could be:

  • is similar to
  • is synonym of (this could even be used to implement redirects)
  • is opposite of
  • see also

or, even more specific:

  • is author of
  • is president of
  • is propulsion of
  • ...

As to the use and application: I imagine a syntax like this:

[[Is Instance of:car]] in the article Chevrolet, rendered as Is Instance of car.

Clicking on car gets you the that article, clicking on Is Instance of gets you the the description-page of the relation. Those links could be displayed at the bottom of the article, just as is done now with the categories.

Some thought to the implementations: all relations can be expressed as a single table of triplets, which is easily done in a database. However, because auf the extensive use of implicite relationships in such a system, the trivial implementation would be extreme inefficient. The reasoning about the implications would have to be done whenever the relations change, not every time a relation is requested. This could be done by storing all implicit relationships in the database too - but that would be a lot of data.

Minimalistic Solution using Relations[Bearbeiten | Quelltext bearbeiten]

As a minimalistic aproach to this, I would suggest to introduce the following relations:

  • Article is Category: To express an instance of relation, along the Kind-Axis. Examples: Geroge W. Bush is Person, Budhism is Religion, Informatics is Science, etc.
  • Article is component of Category or Article: To express part-to-whole relations. Examples: Germany is component of Europe, Engine is component of Car, Optics is component of Physics, etc.
  • Article belongs to Keyword: expresses a loose relation, so that the Keyword serves as a Bibliography for some topic. The Article is not classified as, but associated to.

I belive that those relations would cover most of the things we would need.

(more to come)