Jump to content

User:Morwen/Category Targets

From Wikipedia, the free encyclopedia

Wikipedia categories are a mess, at least as far as machine-readable information is involved. Ironically, the Stub sorting WikiProject has actually come up with a better and more useful categorisation system, that runs alongside the main one.

Basic objectives[edit]

For each article, it should be able to determine what kind of object it describes. Root elements would be

  • Person
  • Organisation
    • Company
  • Location
    • Area
    • Settlement
  • Building
  • Artistic etc Work
    • Book
    • Film
    • TV episode
    • TV series

Alongside this hierachy would be another hierachy for fictional entities.

There is presently no way of obtaining this information programatically, because categories do not imply an 'is a' relationship. For example, Fungus the Bogeyman is in Category:Children's books, which is in Category:Books by genre, which is Category:Books, so you can conclude that Fungus the Bogeyman is a book. However, this logic can also result in the conclusion that Alice in Wonderland (1985 film) is a book, because it is in Category:Alice in Wonderland, which is a subcategory of Category:Children's books.

Ultimately, this should be machine-readable, so that with a database dump you can do a query to get a list of all biographies.

Stub-sorting as a model. Every page gets tagged as 'requiring an ontology', and then gets, say, {{bio-article}} stuck at the end of it. These will say something like

This biographical article is uncategorised. You can help wikipedia by making the categorisation more specific.

or geo-article might read

This geographic article is uncategorised. You can help wikipedia by making the categorisation more specific.

The articles would go into Category:Uncategorised people and Category:Uncategorised places.

So, someone would come along and find the article about Bill Clinton and would replace {{bio-article}} with, say,

{{pol-bio-article|nationality=United States|born=1946|died=alive}}. This would stick it in all the required categories. (Category:United States politicians, Category:1946 births and suchforth.)

Another example might be Alton Towers, which would end up with a tag saying something like

{{visitor-attraction-article|place1=[[Alton]]|place2=[[Staffordshire]]|type=theme park}}

This would then make it be in Category:Tourist attractions in Staffordshire and Category:Theme parks.

Some other things - we can't have this chaos where we have Category:Districts of London and Category:Areas of Leicester and Category:Suburbs of Sydney. There needs to be a kind of grid system for categories, so we have standard subdivisions. So it differs from the stub sorting project in that categories will be pre-created, rather than created as per demand.

The main point of this is that it you can programatically go upwards to find out what kind of entity an article is about, you can programatically go downwards to find all things that belong in a certain category (all books, not including any fictional characters from books), and also that you can programatically find things that aren't categorised properly (because they have no -article tag).

The actual examples and ideas here are secondary to those three requirements.