Benutzer:Johannes Kroll (WMDE)/TLG Filterentwicklung Howto-en

aus Wikipedia, der freien Enzyklopädie
Zur Navigation springen Zur Suche springen

(Deutsche Version hier)

Creating a Custom Filter for Task List Generator[Bearbeiten | Quelltext bearbeiten]

The Task List Generator tool can be extended through custom filter modules. For this purpose, a filter class is created and registered with the tool. The filter code lives in a separate file. Changes to the rest of the tool code should not be necessary.

Filter Modules[Bearbeiten | Quelltext bearbeiten]

Filter modules are located in the subdirectory filtermodules. Each module contains one or more filter classes.

Filter Classes[Bearbeiten | Quelltext bearbeiten]

Each filter class inherits from [doxygen#FlawFilter FlawFilter]. A filter class must contain the attributes shortname, description abd label (see Example Code). Each filter class is made available through a call to FlawFilters.register() at module load time.

Filter Actions[Bearbeiten | Quelltext bearbeiten]

The TLG first executes the query string provided by the front end. The result is a list of page_id values (index into the wikipedia page table). The page_ids are then processed by filter actions. A method in the filter class creates an object whose class inherits from TlgAction. Usually, more than one page_id gets processed per filter action ([doxygen#getPreferredPagesPerAction getPreferredPagesPerAction] method).

SQL Cursors[Bearbeiten | Quelltext bearbeiten]

MySQLdb Cursors have to be kept seperately for each Thread in the TLG. All cursors use the DictCursor class.

Cursor for the wiki selected in the front end[Bearbeiten | Quelltext bearbeiten]

SQL connections and cursors for the selected wiki are available for each thread and can be retrieved with the utils.getCursors()[self.wiki] function in the context of action classes.

Other Cursors[Bearbeiten | Quelltext bearbeiten]

When cursors for other databases are needed, the TempCursor class should be used together with the with statement. These cursors are cached for each thread and automatically deleted. And example of this use can be found in changedetector.py.

Utility Functions and Caching[Bearbeiten | Quelltext bearbeiten]

[doxygen utils.py] contains some functions for commonly used database queries. These use beaker for caching the results. To avoid unnecessary database server roundtrips, these functions should be used instead of re-writing them.

Example Code[Bearbeiten | Quelltext bearbeiten]

The following is a simple filter which detects which were modified today, using the page_touched field.

recent.py

#!/usr/bin/python
# -*- coding:utf-8 -*-
import time
from tlgflaws import *

## A filter which finds all pages which were changed today.
class FRecentlyChanged(FlawFilter):
    shortname= 'RecentlyChanged'                # Name which identifies the filter (don't translate!)
    label= _('Recently Changed')                # Label to be displayed in the front end next to the check box
    description= _('Page was touched today.')   # Longer description text for tool tips
    group= _('Timeliness')                      # Group the filter belongs to.

    # Action class for this filter
    class Action(TlgAction):
        
        # execute() filters the pages and puts matches into resultQueue.
        def execute(self, resultQueue):
            cur= getCursors()[self.wiki]
            # generate format strings for multiple pages.
            format_strings = ','.join(['%s'] * len(self.pageIDs))
            # beginning of today in wikipedia database format
            today= time.strftime( '%Y%m%d000000', time.localtime(time.time()) )
            params= []
            params.extend(self.pageIDs)
            params.append(today)
            # find subset of pages which were changed today
            cur.execute('SELECT * FROM page WHERE page_id IN (%s) AND page_touched >= %%s' % format_strings, params)
            changed= cur.fetchall()
            # return all the pages we found
            for row in changed:
                resultQueue.put(TlgResult(self.wiki, row, self.parent))

    # we want to process 100 pages per action. 
    def getPreferredPagesPerAction(self):
        return 100

    # create an action.
    def createActions(self, wiki, pages, actionQueue):
        actionQueue.put(self.Action(self, wiki, pages))

# register the filter when the module is loaded:
FlawFilters.register(FRecentlyChanged)

License[Bearbeiten | Quelltext bearbeiten]

The tool and filters are licensed under the GPL.

Internationalization[Bearbeiten | Quelltext bearbeiten]

User-visible strings should be written in English and "bracketed" in '_()' for translation with gettext. Optionally, a translation file can be provided in the following form:

msgid "Some text as it appears in the program"
msgtext "Text, wie er im Programm auftaucht"

Testing Filter Code[Bearbeiten | Quelltext bearbeiten]

For testing your filter class, you should clone the git repository from git-repository and run the code on the toolserver in your $HOME. Don't forget adjusting .htaccess if necessary.

You may have to install the following Python modules locally:

  • wikitools (easy_install --prefix=$HOME/.local wikitools)

The following parameter can be given to tlgwsgi.py in a GET request (example):

  • action
    • action=listflaws -- outputs registered filter classes
    • action=query -- queries CatGraph for categories and tests results for flaws.
      • lang=<string> -- wiki language, at the moment we have one CatGraph instance runnign for 'de'
      • query=<string> -- query string for categories, separated by semicolons, '+' and '-' for set intersection/exclusion
      • querydepth=<integer> -- recursion depth for category search
      • flaws=<string> -- space-separated list of flaws to search for ("listflaws" for possible flaw filters).
  • format=json/html/wikitext: choose output format.
  • chunked=true: use Chunked Transfer Encoding (dynamic page rendering, including status display when format=html). Useful for debugging.
  • showthreads=true: Show what's happening in the worker threads. Useful for debugging.
  • i18n=langcode: set output language. Supported values are currently "de", "en".

Fini[Bearbeiten | Quelltext bearbeiten]

When a filter module should be included in the tool, you can send it to me by email (jkroll bei toolserver punkt org) or via github pull request. Of course we also like hearing any suggestions and ideas for the TLG. You can also send me an email for questions or ideas for this text.