Chapter 10. The MediaWiki API

The ability to interact with MediaWiki through an application programming interface is an evolving feature. In this chapter, you will learn about bots, programs used to automate certain administrative tasks on MediaWiki, as well as the MediaWiki API, which is currently in development and is intended to provide a programming interface to MediaWiki so that external applications can interact with it.

Both the section on bots and the section on the API make extensive use of examples written in the Python programming language. Even if you do not know Python, you will be able to learn a lot about how bots and the API work, which you can use to develop scripts in your language of choice. With respect to the API, all interaction is managed through URLs, so you don't even have to write any script to see samples of the API output; simply type the URL in your browser and see what is returned.

Bots: pywikipedia.py

In MediaWiki parlance, a bot is a script or program that is used to perform some administrative task in support of a wiki. There is a special group called bot, so any bot that is used with MediaWiki must have a username that is in the bot group. A person with bureaucrat privileges is required to set the appropriate permissions.

The reason for requiring a special username is twofold. One, you do not want people to be able to automate tasks willy-nilly with your wiki. That's just asking for trouble from spammers and trolls. At the same time, tedious or time-consuming tasks can benefit greatly from automation. Because the work that bots do is often en masse, meaning that they perform some task that might affect hundreds of documents, MediaWiki treats changes made by bots a little differently. For example, when viewing recent changes, changes made by bots can be excluded. By having a special bot group, both issues are addressed.

Bots interact with MediaWiki programmatically, but to date most bots do not use any formal MediaWiki API, because there hasn't been one. While a new API is being crafted (which you will learn about later in the chapter), necessity requires bot developers to create other methods of interacting with MediaWiki programmatically, something that often involves writing scripts that interact with the standard MediaWiki HTML interface.

One particularly well-developed bot is pywikipedia.py, which can be downloaded at http://sourceforge.net/projects/pywikipediabot/.

Python 2.3 or later must be installed (it may work on earlier versions of Python, but this hasn't been tested). It is written in Python, and it is designed to be used with Wikipedia. This means that often some customizations need to be made to the scripts in order to make them work properly on a homegrown MediaWiki wiki.

The first step is to configure the bot with your site's information, and a username and password in the bot group.

Configuring pywikipedia.py

Two files need to be created in order to use pywikipedia: user-config.py and another file named after your wiki. In the example, the file is called profwiki_family.py, which includes a subclass of the Family class. The name that is chosen is important because pywikipedia needs to know which family object to instantiate.

profwiki_family.py

In the main directory of the pywikipedia distribution is a file called family.py, and a directory called families. The family.py file includes the base Family class that needs to be subclassed. The "family" in question is the family of sites that make up Wikipedia. In Wikipedia's case, the family sites are versions of Wikipedia in different languages, so much of the family.py file concerns itself with languages and being able to navigate around the collection of sites that make up Wikipedia.

Inside the families directory are sample subclasses of Family that have been developed by other MediaWiki users. These sample subclasses can be used as examples for more complex configuration.

Typically, most organizations will not have such a large family as Wikipedia, so the following example shows you how to subclass the Family class for a single site:

# -*- coding: utf-8  -*-

import family

# Prof Wikis, by Mark Choate for Wrox

class Family(family.Family):
    def __init__(self):
       family.Family.__init__(self)
       # The name assigned needs to be the same as the
       # prefix used to name the file - in this case,
# that is profwiki_family.py
       self.name = 'profwiki'
       # There's only one language to the site, so I
       # associate the domain of my site with "en".
       # In this instance, I'm accessing the domain
       # locally. If the site were on a different
       # server, I would use the actual domain name of
       # the site.
       self.langs = {'en': '127.0.0.1',}
       # The name of my test wiki is 'Profwikis - MySQL
       # so I assign that to the following namespaces.
       self.namespaces[4] = {'_default': [u'Profwikis -
  MySQL', self.namespaces[4]['_default']],}
        self.namespaces[5] = {'_default': [u'Profwikis -
  MySQL talk', self.namespaces[5]['_default']],}
  # The version of MediaWiki I am using
    def version(self, code):
        return "1.9.3"
    # The path to the wiki
    def path(self, code):
        return '/mysql/index.php'

Save this file in the families directory. After it is complete, the user-config.py file needs to be created.

user-config.py

The default configuration is in config.py. Much like the difference between DefaultSettings.php and LocalSettings.php, config.py contains the default configuration data for the bot. Place any custom configuration data specific to your wiki in a file called user-config.py:

#One line saying "mylang='language'"
#One line saying "usernames['wikipedia']['language']='yy'"
mylang='en'
family='profwiki'
# The following user name MUST be in the bot group.
usernames['profwiki']['en']= u'Mchoate'

In the profwiki_family.py file is the following line:

self.langs = {'en': '127.0.0.1',}

The value for "mylang" in the user-config file corresponds with the language specified in profwiki_family.py. The following line in user-config:

family='profwiki'

corresponds with self.name = 'profwiki' in profwiki_family.py.

What all of this means is that when you execute a script using pywikipedia, the default language is English, which defaults to the server located at 127.0.0.1, using the path mysql/index.php. The script will log in as user Mchoate, which, if it is in the bot group, will then be allowed to make changes to the site.

editarticle.py

The main file is wikipedia.py, which is where much of the core functionality is coded. In most cases, a script includes the wikipedia module when executing code. The first example is editarticle.py, which is a script that enables you to edit articles directly through an external editor, rather than edit them on the wiki website. I won't go into all the specific details of the implementation, but I do want to point out that the script imports the wikipedia module and the config module, which gives it all the information needed to log in and edit a file:

import wikipedia
import config

mchoate$ ./editarticle.py
Checked for running processes. 1 processes currently
   running, including the current process.
Page to edit:

Type in the page title you want to edit at the prompt, and a Tkinter window will be displayed with the wikitext to be edited (as shown in Figure 10-1).

The Tkinter editing window

Figure 10.1. The Tkinter editing window

In order to edit in a different editor, the user-config.py file needs to be updated. The next example shows how to edit the pages using Emacs. The following line must be added to user-config.py:

editor = 'emacs'

Once this is done, you start the script just like before, but Emacs is launched instead of the Tkinter window, as shown in Figure 10-2.

Editing wikitext using Emacs

Figure 10.2. Editing wikitext using Emacs

The script works by getting a copy of the page from the wiki and then saving the data to a temporary file. In the following output sample, the temporary file that was created was /tmp/tmp8YFggr.wiki.

Then, the editor that is configured opens the temporary file. Once the edits are made and the file is saved, the script checks to see whether any changes have been made. If there were, it prompts the user to provide a short summary of what was changed (just like you do when editing a page through a Web interface). Once that's entered, the file is then uploaded to the server:

mchoate$ ./editarticle.py
Checked for running processes. 1 processes currently
   running, including the current process.
Page to edit: Main Page
Getting page [[Main Page]]
Running editor...
/tmp/tmp8YFggr.wiki
+
+ The file has been modified.

What did you change?  I added a new sentence.
Getting a page to check if we're logged in on profwiki:en
Changing page [[en:Main Page]]

spellcheck.py

One other useful script available in pywikipedia is spellcheck.py, which (not surprisingly) performs a spell-check on wiki pages. In order to use spellcheck.py you first have to download a dictionary file from http://pywikipediabot.cvs.sourceforge.net/pywikipediabot/pywikipedia/spelling/. Included are files for several languages. The file for English is spelling-en.txt, which should be downloaded into the spelling directory in the pywikipedia distribution (it weighs in at about 2.5 megabytes of data, which is why they don't include it in the main distribution).

Once the spelling dictionaries are in place, spell-checking a page is simply a matter of executing the spellcheck.py script and passing it the name of the page to spell-check. In the next example, the article titled "Main Page" is going to be spell-checked:

mchoate$ ./spellcheck.py "Main Page"

When the script finds a questionably spelled word, the user is prompted through the console with various options. Like most spell-checkers, you are given the option to add the word to the dictionary, to ignore the word, to replace the text, to replace the text while not saving the alternative in the database, to guess, to edit by hand, or to stop checking the page altogether. In the following example, the spell-checker questions the spelling of "pywikipedia," and, amusingly, the spelling of "wiki." Once finished, the changes are saved and then uploaded back to the wiki:

Checked for running processes. 3 processes currently running,
   including the current process.
Getting wordlist
Wordlist successfully loaded.
Getting page [[Main Page]]
============================================================
Found unknown word 'Pywikipedia'
Context:
==Editing a Page with Pywikipedia==

The pywikipedia bot lets you edit
------------------------------------------------------------
a: Add 'Pywikipedia' as correct
c: Add 'pywikipedia' as correct
i: Ignore once
r: Replace text
s: Replace text, but do not save as alternative
g: Guess (give me a list of similar words)
*: Edit by hand
x: Do not check the rest of this page
: c
============================================================
Found unknown word 'wiki'
Context:
kipedia bot lets you edit pages in your wiki with an external editor such as:

*
------------------------------------------------------------
a: Add 'wiki' as correct
i: Ignore once
r: Replace text
s: Replace text, but do not save as alternative
g: Guess (give me a list of similar words)
*: Edit by hand
x: Do not check the rest of this page
: a
============================================================
Found unknown word 'TKinter'
Context:
 with an external editor such as:

* TKinter
* Emacs
* Vi

The file has been modified
------------------------------------------------------------
a: Add 'TKinter' as correct
c: Add 'tKinter' as correct
i: Ignore once
r: Replace text
s: Replace text, but do not save as alternative
g: Guess (give me a list of similar words)
*: Edit by hand
x: Do not check the rest of this page
: a
Which page to check now? (enter to stop)

These are only two examples of the scripts included in pywikipedia that can be used to assist in the maintenance of your wiki. Also included are scripts for harvesting images, uploading images, changing categories, and more. Many of them are tailored to Wikipedia, so you may find that they need to be edited to suit your needs. You will also notice that some of them require Python to run, because the script needs to access the X-Windows server.

API.php

The developers of MediaWiki know that an easier-to-use API for MediaWiki would be a great improvement. Page through the code in pywikipedia and you'll see that it's a fairly complicated bit of programming ... and very long. A new API is being developed to streamline the developer's work, and while it is not completed, it already is very capable, and affords the developer a simple, efficient way of interacting directly with the wiki's data.

Because the API is in a state of change, be sure to check www.mediawiki.org/wiki/API for the latest information about supported features. The developer is Yuri Astrakhan (User:Yurik on MediaWiki.org).

Configuration

The first step to using the API is to configure MediaWiki to use it. Add the following to LocalSettings.php:

/**
* Enable direct access to the data API
* through api.php
*/
$wgEnableAPI = true;
$wgEnableWriteAPI = true;

The API scripts are found in the /includes/api/ directory, and the entry point is api.php, which is in the top-level directory of MediaWiki, along with index.php. In order to access the API, all you need to do is replace index.php in the URL with api.php, like so (substituting your domain name, of course):

http://127.0.0.1/wiki/api.php

Accessing the API

Because of the simplicity of the API, you can use a variety of ways to access it. Command-line tools such as wget and curl work, as do JavaScript, ruby, PHP, and Python. Any language that can generate an HTTP request can be used to access the API.

Actions

The current API implements five basic actions (and an edit action should be available by the time this book is published):

  • Help: The help action returns basic documentation about how to use the API.

  • Login: Because some activities require a user to be logged in, a login action is included.

  • Opensearch: This implements the OpenSearch protocol and enables the developer to search the contents of the wiki. You can learn more about the OpenSearch protocol at http://opensearch.org/.

  • Feedwatchlist: This returns an RSS feed of a user's watchlist.

  • Query: This action enables developers to query the MediaWiki database.

Using these actions is demonstrated later in the chapter.

Formats

The available formats are as follows: json, jsonfm, php, phpfm, wddx, wddxfm, xml, xmlfm, yaml, yamlfm, and rawfm (the default value is xmlfm). The formats that end with fm are HTML representations of the output so that it can be displayed on a webpage. All actions can use any output style, with one exception. The Feedwatchlist action's output can only be one of two flavors of XML: RSS or Atom.

The following examples are based on a simply query action that requests information about the wiki's Main Page article. The URL looks like this:

action=query&format=json&titles=Main+Page&meta=siteinfo&prop=info

In order to generate the different output formats, just change format=json to represent the desired output.

JSON Format

JSON is a format based on JavaScript. You can find specifications for it at http://json.org/. Following are the HTTP headers returned by this request. Notice that the Content-Type header value is application/json:

Date: Thu, 09 Aug 2007 02:58:44 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.2.0
X-Powered-By: PHP/5.2.0
Set-Cookie: wikidb_profwiki__session=onggurki4gg6v56ik26s9feef1; path=/
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: s-maxage=0, must-revalidate, max-age=0
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json; charset=utf-8

The actual JSON output follows (it has been reformatted to make it more legible:

{"query":
    {"pages":
        {"1":
           {"pageid":1,"ns":0,"title":"Main
    Page","touched":"2007-07-21T18:34:55Z","lastrevid":166
           }
        },"general":
        {"mainpage":"Main
    Page","base":"http://127.0.0.1/mysql/index.php/Main_Page",
    "sitename":"ProfWikis - MySQL","generator":"MediaWiki 1.9.3","case":"first-
    letter","rights":""
        }
    }
}

XML Format

The HTTP headers for the XML format are the same as for JSON, except that the Content-Type is now text/xml:

Content-Type: text/xml; charset=utf-8

The equivalent XML output follows:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page"
   touched="2007-07-21T18:34:55Z" lastrevid="166"/>
        </pages>
        <general mainpage="Main Page"
   base="http://127.0.0.1/mysql/index.php/Main_Page"
   sitename="ProfWikis - MySQL" generator="MediaWiki 1.9.3"
   case="first-letter" rights=""/>
    </query>
</api>

WDDX Format

WDDX (Web Distributed Data eXchange) is a standard originally developed by Macromedia for its Cold Fusion server product. The specification can be found at www.openwddx.org. While it has largely been surpassed by other data exchange specifications, it is still widely enough used that the developers felt it was important enough to include (the last news item on the OpenWDDX website was posted in 2001). The wordy WDDX output for the query follows (again formatted for clarity):

<?xml version="1.0"?>
<wddxPacket version="1.0">
    <header/>
    <data>
        <struct>
            <var name="query">
                <struct>
                    <var name="pages">
                        <struct>
                            <var name="1">
                                <struct>
                                    <var name="pageid">
                                        <number>1</number>
                                    </var>
                                    <var name="ns">
                                        <number>0</number>
                                    </var>
                                    <var name="title">
                                        <string>Main Page</string>
                                    </var>
                                    <var name="touched">
                                        <string>2007-07-21T18:34:55Z</string>
                                    </var>
                                    <var name="lastrevid">
                                        <number>166</number>
                                    </var>
                                </struct>
                            </var>
                        </struct>
                    </var>
                    <var name="general">
                        <struct>
                            <var name="mainpage">
                                <string>Main Page</string>
                            </var>
                            <var name="base">
                                <string>http://127.0.0.1/mysql/index.php/Main_Page
                            </string>
                            </var>
                            <var name="sitename">
                                <string>ProfWikis - MySQL</string>
                            </var>
                            <var name="generator">
                                <string>MediaWiki 1.9.3</string>
                            </var>
                            <var name="case">
<string>first-letter</string>
                            </var>
                            <var name="rights">
                                <string/>
                            </var>
                        </struct>
                    </var>
                </struct>
            </var>
        </struct>
    </data>
</wddxPacket>

PHP Format

The PHP serialized format is useful for PHP-based clients (see www.php.net/serialize). The Content-Type is application/vnd.php.serialized:

Content-Type: application/vnd.php.serialized; charset=utf-8

The icky output is as follows:

a:1:{s:5:"query";a:2:{s:5:"pages";a:1:{i:1;a:5:{s:6:"pageid";
   i:1;s:2:"ns";i:0;s:5:"title";s:9:"Main
   Page";s:7:"touched";s:20:"2007-07-21T18:34:55Z";s:9:"lastrevid";
   i:166;}}s:7:"general";a:6:{s:8:"mainpage";
   s:9:"Main Page";s:4:"base";s:42:"http://127.0.0.1/mysql/
   index.php/Main_Page";s:8:"sitename";s:17:"ProfWikis - MySQL";
   s:9:"generator";s:15:"MediaWiki 1.9.3";s:4:"case";s:12:"first-letter";
   s:6:"rights";s:0:"";}}}

YAML Format

Read all about YAML (and find out the definitive answer to the question regarding what YAML actually means) here: http://yaml.org/. The YAML Content-Type is as follows:

Content-Type: application/yaml; charset=utf-8

Here is the YAML output:

query:
  pages:
    -
      pageid: 1
      ns: 0
      title: Main Page
      touched: 2007-07-21T18:34:55Z
      lastrevid: 166
  general:
    mainpage: Main Page
    base: >
      http://127.0.0.1/mysql/index.php/Main_Page
    sitename: ProfWikis - MySQL
generator: MediaWiki 1.9.3
    case: first-letter
    rights:

The API provides a rich set of options in terms of how the data is transferred to your application. The final choice ultimately depends upon the developer's preference, or is contingent upon other environmental factors.

In the next section, the API is illustrated with a Python script. In these examples, the selected output is XML, but it could just as easily be JSON, YAML or WDDX, as Python libraries exist to parse these formats as well.

Python Script

The following examples all come from a Python script written to illustrate the actions and the output of the MediaWiki API. This script is loosely based on a sample script for the old MediaWiki "Query" API, posted on MediaWiki at http://en.wikipedia.org/wiki/User:Yurik/Query_API/User_Manual#Python, but it has been expanded considerably.

I provide examples of all of the major actions, but the script is by no means exhaustive, in part because the API is still in a state of flux, with new features being added regularly. It can best be used as a starting point for developing your own scripts. It was also written with an eye toward being clear and easy to understand, rather than being particularly efficient or clever. It requires the use of Python 2.5.

Obviously, this exercise will be more informative if you are familiar with Python, but even if you are not a Python expert, you should be able to follow along as long as you have a solid understanding of computer programming. In the code and in other places where it is appropriate, you will see some additional explanation about what the Python code is doing, for readers who are unfamiliar with the language.

The first block of code in the script does some preparatory work, such as import libraries and define global variables used by the script. The urllib2 library is particularly useful in this case because it offers a rich set of tools for accessing resources through URLs, including cookie management, which is needed to track the logged-in status of the script when performing tasks that require special permissions. All the functions return XML, which is parsed by Python's ElementTree class. The global variables need to be customized to your site. The QUERY_URL is simply the base URL of the request, and the COOKIEFILE variable identifies where the cookie file will be stored, which enables the script to log in and stay logged in over a series of requests.

#!/usr/bin/env python
# encoding: utf-8
" " "
api.py

Created by Mark on 2007-08-06.
Copyright (c) 2007 The Choate Group, LLC. All rights reserved.
" " "

import sys
import os
import urllib
import urllib2
import cookielib
import xml.etree.ElementTree
import StringIO

# global variables for the query url, http headers and the location
# of the cookie file to be used by urllib2
QUERY_URL = u"http://127.0.0.1/mysql/api.php"
HEADERS = {"User-Agent"  : "API Test/1.0"}
COOKIEFILE = "/Users/mchoate/Documents/Code/MediaWiki/test.cookie"

The ApiRequest class is being defined in the next block of code. The class will be used like this:

api = ApiRequest()
f = api.doHelp()

In the first line, the ApiRequest is instantiated, and then the api object calls the doHelp() convenience method, which returns a file-like object that contains the XML data returned by MediaWiki:

class ApiRequest:
    " " "
    Encapsulates the HTTP request to MediaWiki, managing cookies and
    handling the creation of the necessary URLs.
    " " "
    def __init__(self):
        pass

    def _initCookieJar(self):
        " " "
        The LWPCookieJar class saves cookies in a format compatible with
        libwww-perl, which looks like this:

        #LWP-Cookies-2.0
        Set-Cookie3: wikidb_profwiki_Token=8ade58c0ee4b60180ab7214a93403554;
        path="/"; domain="127.0.0.1"; path_spec; expires="2007-09-08 22:36:14Z";
        version=0
        Set-Cookie3: wikidb_profwiki_UserID=3; path="/"; domain="127.0.0.1";
        path_spec; expires="2007-09-08 22:36:14Z"; version=0
        Set-Cookie3: wikidb_profwiki_UserName=Mchoate; path="/";
        domain="127.0.0.1"; path_spec; expires="2007-09-08 22:36:14Z"; version=0

        " " "
        cj = cookielib.LWPCookieJar()

        # If the cookie file exists, then load the cookie into the cookie jar.
        if os.path.exists(COOKIEFILE):
            cj.load(COOKIEFILE)

        # Create an opened for urllib2. This means that the cookie jar
        # will be used by urllib2 when making HTTP requests.
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)
        return cj
def _saveCookieJar(self,cj):
        " " "
        Save the cookies in the cookie file.
        " " "
        cj.save(COOKIEFILE)

    def execute(self, args):
        " " "
        This is a generate method called by the convenience methods.
        The request takes place in three stages. First, the cookie jar
        is initialized and the cookie file is loaded if it already exists. Then,
        the dictionary "args" is urlencoded and urllib2 generates the HTTP request.
        The result of the request is returned as a file-like object. Once it is
        received, the cookie data is saved so that it will be available for the
        next request, and the data is returned to the calling method.
        " " "
        cj = self._initCookieJar()
        req = urllib2.Request(QUERY_URL, urllib.urlencode(args), HEADERS)
        f = urllib2.urlopen(req)
        self._saveCookieJar(cj)
        return f

The remaining methods all call the execute method, using arguments appropriate for the kind of request being made. Not every option is explored in the remaining code, but the script is easily extended with new request types.

Help

In order to get the latest information, you can make a help call to the API, which is particularly helpful considering the fact that the API is in an evolving state. The URL looks like this:

api.php?action=help

Alternatively, because it is the default action, it can also look like this:

api.php

When this is executed, a fairly detailed list of actions and their associated parameters is returned.

ApiRequest.doHelp()

The ApiRequest Python class creates a Help action request with the following method. The values that will be passed to the execute() method are set in a dictionary object:

def doHelp(self, format="xml"):
    args={"action": "help",
            "format": format}
    f = self.execute(args)
    return f

When the execute() method is called, the Python dictionary object is converted to a URL (?action=help&format=xml), and then the HTTP request is made. The results are returned in a file-like object, which, in addition to the usual Python file object methods, such as read(), also has two additional methods that can be useful, geturl() and info(), whose functionalities are described in the sample code that follows:

api = ApiRequest()
    f = api.doHelp()

    # Print a string representation of the URL that was called.
    # Note that this doesn't include the arguments, so it would
    # look something like this: http://127.0.0.1/wiki/api.php
    print f.geturl()

    # Print the headers from the HTTP response.
    print f.info()

    # Print the contents of the file-like object, which can be
    # xml, wddx, yaml, json, etc., depending on the format requested.
    print f.read()

Login

The login action has two required parameters and one optional parameter. Required are lgname and lgpassword; optional is lgdomain. The URL required to execute this action is as follows:

api.php?lgname=Mchoate&lgpassword=XXX&format=xml

The XML output of the action includes information about whether the login attempt was successful, the user ID of the person logged in, as well as the username, and a token that signifies a successful login, which can be used in subsequent requests to identify the logged in user.

MediaWiki also sets a cookie on the browser if the login is successful. In the mediawikiapi.py script, the urllib2 object handles accepting the cookie and sending it back on subsequent requests, which supersedes the need to use the token. The cookie encodes the same data as the value for lgtoken.

<?xml version="1.0" encoding="utf-8"?>
<api>
    <login result="Success" lguserid="3" lgusername="Mchoate"
   lgtoken="8ade58c0ee4b60180ab7214a93403554"/>
</api>

The doLogin() method functions slightly differently than the other methods do in that it doesn't return data from the request. Instead, it returns a Boolean value indicating whether the login was successful or not. This method can be called like so:

api = ApiRequest()
if api.doLogin("Mchoate", "connor"):
   print "Login was successful.

"
else:
   print "Login failed.

"

ApiRequest.doLogin()

The doLogin() method implementation follows. Notice in the code that the XML returned is parsed by ElementTree, and the content of the XML is tested in order to determine whether the login was successful or not:

def doLogin(self, name, password, domain="", format="xml"):
        " " "
        The login action is used to login. If successful, a cookie
        is set, and an authentication token is returned.

        Example:
          api.php?action=login&lgname=user&lgpassword=password
        " " "
        args={
            "action"   : "login",
            "format"    : format,
            "lgname"    : name,
            "lgpassword": password,
        }
        # The domain is optional
        if domain:
            args.update({"lgdomain":domain})

        # MediaWiki returns an XML document with a blank line at
        # the top, which causes an error while parsing. The
        # following code strips whitespace at the front and
        # back of the XML document and returns a string.
        s = self.execute(args).read().strip()

        # ElementTree expects a file-like object,
        # so one is created for it.
        f = StringIO.StringIO(s)
        root = xml.etree.ElementTree.parse(f).getroot()

        # The root element is the <api> element.
        login = root.find("login")

        # The <login> element has an attribute 'result'
        # that returns 'Success' is the login was successful
        test = login.attrib["result"]
        if test == "Success":
            return True
        else:
            return False

Opensearch

This action enables you to search your wiki. The method is very similar to the doHelp() method and should be self-explanatory.

ApiRequest.doOpensearch()

def doOpenSearch(self, search="", format="xml"):
        args={
              "action"   : "search",
"format"    : format
          }
        f = self.execute(args)
        return f

Feedwatchlist

The Feedwatchlist action returns either an Atom or an RSS feed containing a list of pages that are being watched by the user. In this respect, it differs from the other actions in that it returns a special XML document. Unlike the others, it does not have a format parameter. Instead, it has a feedformat parameter than can be either "rss" or "atom".

ApiRequest.doFeedWatchlist()

def doFeedWatchList(self, feedformat="rss"):
        args={
              "action"   : "feedwatchlist",
              "feedformat": feedformat
          }
        f = self.execute(args)
        return f

RSS Feed

If an RSS feed is requested, then the following XML will be returned:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css"
   href="http://127.0.0.1/mysql/skins/common/feed.css?42b"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
        <channel>
                <title>ProfWikis - MySQL - My watchlist [en]</title>
                <link>http://127.0.0.1/mysql/index.php/Special:Watchlist</link>
                <description>My watchlist</description>
                <language>en</language>
                <generator>MediaWiki 1.9.3</generator>
                <lastBuildDate>Thu, 09 Aug 2007 22:33:30 GMT</lastBuildDate>
                <item>
                        <title>Main Page</title>
                        <link>http://127.0.0.1/mysql/index.php/Main_Page</link>
                        <description> (WikiSysop)</description>
                        <pubDate>Sat, 21 Jul 2007 18:31:51 GMT</pubDate>
                <dc:creator>WikiSysop</dc:creator>  </item>
        </channel>
</rss>

Atom Feed

If an "atom" feed is requested, then the data is reformatted to this specification:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css"
href="http://127.0.0.1/mysql/skins/common/feed.css?42b"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
                <id>http://127.0.0.1/mysql/api.php</id>
                <title>ProfWikis - MySQL - My watchlist [en]</title>
                <link rel="self" type="application/atom+xml"
   href="http://127.0.0.1/mysql/api.php"/>
                <link rel="alternate" type="text/html"
   href="http://127.0.0.1/mysql/index.php/Special:Watchlist"/>
                <updated>2007-08-09T22:33:31Z</updated>
                <subtitle>My watchlist</subtitle>
                <generator>MediaWiki 1.9.3</generator>

        <entry>
                <id>http://127.0.0.1/mysql/index.php/Main_Page</id>
                <title>Main Page</title>
                <link rel="alternate" type="text/html"
   href="http://127.0.0.1/mysql/index.php/Main_Page"/>
                                <updated>2007-07-21T18:31:51Z</updated>

                <summary type="html"> (WikiSysop)</summary>
                <author><name>WikiSysop</name></author> </entry>
        </feed>

Query

The query action is the workhorse of the MediaWiki API. It takes a complex set of parameters whose composition varies depending on the various kinds of queries that are available. The base query URL starts like this:

api.php?action=query

This doesn't get you very far because all queries need to have some kind of parameters that narrow down the selection of what is returned (otherwise, what's the point of querying?). The group of queries uses one of the following parameters: titles, pageids, or revids. These are described in the section "Searching by Title, Page ID, or Revision ID" that follows. The next query type is a list, which is described in the "Lists" section, followed by the last basic type, generators, discussed in the "Generators" section.

Searching by Title, Page ID, or Revision ID

There are three parameters, titles, pageids, and revids, that enable you to query MediaWiki by title, page ID, or revision ID, respectively. All three work similarly, so the following examples use only titles; just bear in mind that you can do the same thing with the other parameters as well.

In all three cases, you can search for more than one value. To do so, you only need to separate the values by the pipe (|) character (a pattern used throughout the API), as is shown in the following example:

api.php?action=query&titles=Main+Page|Some+Other+Page&format=xml

Because the output of query actions are more varied than those of the others reviewed, the following sections use a slightly different format to describe them. First, you will see examples showing how the URLs can be formed to get the particular information you are looking for. Once you've reviewed the important variations, then you will learn the mediawikiapi.py script method that can be used to generate the different requests.

Simple Titles Query

A basic query that requests pages based upon their titles is illustrated in the following example:

api.php?action=query&titles=Main+Page&format=xml

The XML-formatted output of this request includes information about the page ID, plus the namespace of the page (which in this case is the default namespace):

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page"/>
        </pages>
    </query>
</api>

This, of course, is of little value unless you simply wanted to know the page ID for this particular page. Chances are good you will want more information, and this information is requested by the prop parameter, which can be one of two values, both of which are illustrated next.

Property: info

The following URL requests general information about the page titled "Main Page" by assigning the info value to the prop parameter:

api.php?action=query&format=xml&titles=Main+Page&prop=info

The output of this request now includes more information: the date the page was last "touched," and the last revision id (or current revision id, depending on whether you are a glass half-empty or glass half-full kind of person):

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page"
   touched="2007-07-21T18:34:55Z" lastrevid="166"/>
        </pages>
    </query>
</api>

Property: revisions

The second value available to prop is the revisions value:

api.php?action=query&format=xml&titles=Main+Page&prop=revisions

When this value is used, additional information is returned about the last (or current) revision id. Actually, the only new data it adds by default is the oldid number:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page">
                <revisions>
                    <rev revid="166" pageid="1" oldid="157"/>
                </revisions>
            </page>
        </pages>
    </query>
</api>

There are times when you want more data about previous revisions, so the MediaWiki API provides a handful of parameters that can be used alongside the prop parameter when its value is set to revisions. These parameters are outlined in the following table.

Parameter

Value

Rvprop

Determines which revision properties to return. Values can be: timestamp, user, or comment. More than one value can be included by separating them with a | character, like so: rvprop=timestamp|user|comment.

rvlimit

Determines the maximum number of revision pages to return. The default is 10.

rvstartid

The value is the revision id, which indicates the starting point of the list of revisions that will be returned. Note that the rvstartid value can be higher or lower than the rvendid value, depending on the direction of the sort, as specified in rvdir (see below).

rvendid

The ending point of the range of revision IDs whose starting point is specified by rvstartid.

rvstart

The timestamp of the starting point of the revisions to return.

rvend

The timestamp of the ending point of the revisions to return.

rvdir

Determines the sort direction of the returned list of revisions, either from older to newer or newer to older. The possible values are either older or newer. The default is older, which lists the revisions in descending order, with the newest revision first.

Using these parameters can be somewhat tricky at first if you do not understand the impact that rvdir has on the output. It is best illustrated with a few examples. The following URL illustrates a basic request that includes a request for information about when each revision was created, who created it, and any user comments that may have been added. It also limits results to 10 revisions and returns the list of revisions in reverse order of the creation date, so that the most recent revision is listed first, followed by the rest in descending order:

api.php?format=xml&rvprop=timestamp%7Cuser%7Ccomment&
  prop=revisions&rvdir=older&titles=Main+Page&rvlimit=
  10&action=query

The XML output is as follows:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page">
                <revisions>
                    <rev revid="166" pageid="1" oldid="157"
 user="WikiSysop" timestamp="2007-07-21T18:31:51Z"/>
                    <rev revid="165" pageid="1" oldid="156"
 user="WikiSysop" timestamp="2007-07-17T23:47:46Z"/>
                    <rev revid="150" pageid="1" oldid="141"
 user="WikiSysop" timestamp="2007-06-21T19:02:09Z"/>
                    <rev revid="149" pageid="1" oldid="140"
 user="WikiSysop" timestamp="2007-06-21T19:00:21Z"/>
                    <rev revid="146" pageid="1" oldid="137"
 user="WikiSysop" timestamp="2007-06-21T16:19:19Z"/>
                    <rev revid="134" pageid="1" oldid="125"
 user="WikiSysop" timestamp="2007-06-21T14:58:39Z"/>
                    <rev revid="93" pageid="1" oldid="87"
 user="WikiSysop" timestamp="2007-06-04T20:05:07Z"/>
                    <rev revid="91" pageid="1" oldid="85"
 user="WikiSysop" timestamp="2007-06-01T19:48:49Z"/>
                    <rev revid="77" pageid="1" oldid="74"
 user="WikiSysop" timestamp="2007-05-31T18:12:51Z"/>
                    <rev revid="76" pageid="1" oldid="73"
 user="WikiSysop" timestamp="2007-05-31T17:43:53Z"/>
                </revisions>
            </page>
        </pages>
    </query>
    <query-continue>
        <revisions rvstartid="64"/>
    </query-continue>
</api>

At the end of the XML data is a <query-continue> XML tag. This is here because the request limited the returned values to no more than 10. Because there are more than 10 revisions for this page, the id of the next revision in sequence is returned so that it can be used on subsequent requests.

Revision Direction: older The next query is just like the previous query except that two parameters are added: rvstartid and rvendid. The query says to start with revision ID 77 and to end with revision ID 150:

api.php?format=xml&rvprop=timestamp%7Cuser%7Ccomment&prop=
  revisions&rvdir=older&rvstartid=77&titles=Main+Page&rvlimit
  =10&rvendid=150&action=query

When this query is executed, the following data is returned:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page"/>
        </pages>
    </query>
</api>

You may have noticed that something is missing. Where are the revisions between 77 and 150? The answer is that the request is asking for a set of information that cannot exist. The order of the results is the same as the previous query, which means that the most recent revision is first, followed by the other revisions in descending order. This request tells MediaWiki to start at the 77th revision and to end at the 150th revision.

Because it is ordered in descending order, and the most recent revision is 166, MediaWiki returns nothing. One solution is to tell MediaWiki to start with the 150th revision and to end with the 77th revision. The other solution is to request the list in the reverse direction, from oldest to newest.

Revision Direction: newer The modified request now looks like the following—the only change is setting the rvdir parameter to the value newer:

format=xml&rvprop=timestamp%7Cuser%7Ccomment&prop=revisions
  &rvdir=newer&rvstartid=77&titles=Main+Page&rvlimit=10&
  rvendid=150&action=query

The results of this query are markedly different from the first. The first revision listed has a revision ID of 77, and the last revision has an ID of 150, so it has constrained the list according to the start and end properties set in the query.

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <pages>
            <page pageid="1" ns="0" title="Main Page">
                <revisions>
                    <rev revid="77" pageid="1" oldid="74"
   user="WikiSysop" timestamp="2007-05-31T18:12:51Z"/>
                    <rev revid="91" pageid="1" oldid="85"
   user="WikiSysop" timestamp="2007-06-01T19:48:49Z"/>
                    <rev revid="93" pageid="1" oldid="87"
   user="WikiSysop" timestamp="2007-06-04T20:05:07Z"/>
                    <rev revid="134" pageid="1" oldid="125"
user="WikiSysop" timestamp="2007-06-21T14:58:39Z"/>
                    <rev revid="146" pageid="1" oldid="137"
   user="WikiSysop" timestamp="2007-06-21T16:19:19Z"/>
                    <rev revid="149" pageid="1" oldid="140"
   user="WikiSysop" timestamp="2007-06-21T19:00:21Z"/>
                    <rev revid="150" pageid="1" oldid="141"
   user="WikiSysop" timestamp="2007-06-21T19:02:09Z"/>
                </revisions>
            </page>
        </pages>
    </query>
</api>

ApiRequest.doTitlesQuery()

Because there are so many variations to the parameters that can be used when making these kinds of queries, the query code in the example uses a Python idiom that enables you to pass a varying number of parameters to the method. Note that the same basic method can be used for pageids and revids queries with only the slight modification of swapping pageids wherever titles appears, or revids wherever titles appears:

def doTitlesQuery(self, titles, format, **args):
        args.update({
        "action": "query",
        "titles": titles,
        "format": format}
        )
        f = self.execute(args)
        return f

The **args argument is a dictionary of key value pairs that is generated by adding named parameters to the method call. This method requires a value for titles and format, but will accept any number of named parameters, as illustrated in the following example, which shows three different but perfectly acceptable ways of calling the method:

api = ApiRequest()
f = api.doTitlesQuery("Main Page", "xml")
f = api.doTitlesQuery("Main Page", "xml", rvprop="info")
f = api.doTitlesQuery("Main Page", "xml", rvprop="revisions", rvlimit="10)

Lists

Queries that return lists work a little differently than the queries seen thus far. There are a few important things to understand:

  1. You can request eight pre-defined lists: allpages, logevents, watchlist, recentchanges, backlinks, embeddedin, imagelinks, and usercontribs. These are described in detail below.

  2. Lists are used instead of titles, pageids and revids. All four of these query types are mutually exclusive.

  3. Lists cannot be used with any of the prop and revision parameters.

  4. There is an exception to rule number 3. Lists can be used as what is called a generator in the API, which means that the list can be used in place of titles, pageids, and revids, in which case all of the prop and revision parameters are available to the request. This means that instead of typing in a long list of page titles to search for, you can use a list as the source. This concept is best illustrated with examples, which can be found in the section "Generators" later in this chapter.

A basic list query is constructed like the following URL:

api.php?action=query&format=xml&list=allpages

The output of such a query follows:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query-continue>
        <allpages apfrom="Image galleries"/>
    </query-continue>
    <query>
        <allpages>
            <p pageid="49" ns="0" title="ASamplePage"/>
            <p pageid="28" ns="0" title="A new page"/>
            <p pageid="33" ns="0" title="Basic Image Links"/>
            <p pageid="34" ns="0" title="Basic Media Namespace Links"/>
            <p pageid="41" ns="0" title="College Basketball"/>
            <p pageid="42" ns="0" title="College Football"/>
            <p pageid="39" ns="0" title="College Sports"/>
            <p pageid="20" ns="0" title="Core parser functions"/>
            <p pageid="19" ns="0" title="Headings"/>
            <p pageid="35" ns="0" title="Image Alignment"/>
        </allpages>
    </query>
</api>

An important item to note is that by default, all requests are limited to 10 items. This can be overridden by the correct parameter, which varies according to which list type is being requested. Each list type has its own collection of parameters that it can use, and these are documented in the following pages, along with sample output.

List: allpages

Parameter

Value

apfrom

Returns a list of pages ordered alphabetically, starting with titles equal to or higher than the letter or letters used. If apfrom="bamboozled", then the list will return only those pages that come after bamboozled when sorted alphabetically.

apprefix

Returns a list of pages whose title starts with the string passed as the value. If apprefix="Ma", then all pages whose title starts with Ma will be returned. If you leave the parameter empty, like apprefix='"', then no pages will be returned. Leave it out of the query entirely if you do not want to use it.

apnamespace

The number of the namespace from which the list should be derived. It should be a value from 0 to 15 (unless you've added custom namespaces).

apfilterredir

Determines which pages to list based upon one of three values: all, redirects and nonredirects. The default is all.

aplimit

Determines the maximum number of pages to return. The default is 10.

In the previous section, an example of a simple request was already illustrated. The following request is a little more complicated, and uses the parameters available to the allpages request:

api.php?apfilterredir=all&apprefix=M&format=xml&list=allpages
  &apfrom=A&apnamespace=0&action=query&prop=revisions&
  aplimit=10

The preceding request asks for all pages that start with the letter A or higher in the alphabet and all pages that have the prefix of M. Of course, because M comes after A in the alphabet, all pages with titles that start with M are included in the results:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query>
        <allpages>
            <p pageid="24" ns="0" title="Magic Words"/>
            <p pageid="10" ns="0" title="Magic Words that
    show information about the page"/>
            <p pageid="9" ns="0" title="Magic Words that use underscores"/>
            <p pageid="11" ns="0" title="Magic word tests"/>
            <p pageid="1" ns="0" title="Main Page"/>
            <p pageid="22" ns="0" title="Math"/>
            <p pageid="51" ns="0" title="MediaWiki Extensions"/>
        </allpages>
    </query>
</api>

The remaining list types all have unique parameters, but there is a lot of overlap with the parameters used in the previous example. Therefore, the next sections document the parameters each list type uses, but do not provide specific output examples.

List: logevents

Parameter

Value

letype

Filters log events based on the type of log event. Legal values are block, protect, rights, delete, upload, move, import, renameuser, newusers, and makebot. Separate each value with a pipe (|) if you are filtering on more than one value.

lestart

The timestamp of the starting point of the list of log events that will be returned.

leend

The timestamp of the ending point of the list of log entries that will be returned.

ledir

The sort order of the list of log entries that is returned. The value can be newer or older. The default value is older.

leuser

Filters entries by username.

letitle

Returns log entries for a given page.

lelimit

Determines the maximum number of log entries to return. The default is 10.

List: watchlist

Parameter

Value

wlallrev

Includes all revisions of the pages in the watchlist that will be returned. This parameter doesn't take a value; if it is present in the query, all revisions are returned. If it is absent, only the current page is returned. action=query&list=watchlist&wlallrev

wlstart

The timestamp of the starting point of the watchlist that will be returned.

wlend

The timestamp of the ending point of the watchlist that will be returned.

wlnamespace

The number of the namespace from which the list should be derived. It should be a value from 0 to 15 (unless you've added custom namespaces).

wldir

Determines the sort direction of the returned list of pages, either from older to newer or newer to older. The value is either older or newer, and the default value is older.

wllimit

Determines the maximum number of pages to return. The default is 10.

wlprop

Specifies which additional items to get (nongenerator mode only). The value can be one or more of the following: user, comment, timestamp, and patrol. When using multiple values, separate them with a pipe (|).

List: recentchanges

Parameter

Value

rcstart

The timestamp of the starting point of the list of recent changes that will be returned.

rcend

The timestamp of the ending point of the list of recent changes that will be returned.

rcdir

Determines the sort direction of the returned list of recent changes, either from older to newer or newer to older. The value is either older or newer.

rcnamespace

The number of the namespace from which the list should be derived. It should be a value from 0 to 15 (unless you've added custom namespaces).

rcprop

Includes additional properties in the return values. The value can be one or more of the following: user, comment, and flags. When using multiple values, separate them with a pipe (|).

rcshow

Filters returned items based on the criteria specified in the value. Possible values are minor, !minor, bot, !bot, anon, and !anon. Values that start with ! are negations. In other words, minor means include minor changes in the results, whereas !minor means do not include minor changes in the results. Likewise, you can specify whether to include changes made by bots and changes made by anonymous users.

rclimit

Determines the maximum number of pages to return. The default is 10.

List: backlinks

Parameter

Value

blcontinue

When more results are available, use this to continue.

blnamespace

The number of the namespace from which the list should be derived. It should be a value from 0 to 15 (unless you've added custom namespaces).

bllimit

Determines the maximum number of pages to return. The default is 10.

List: emeddedin

Parameter

Value

einamespace

The number of the namespace from which the list should be derived. It should be a value from 0 to 15 (unless you've added custom namespaces).

eiredirect

If the linking page is a redirect, this finds all pages that link to that redirect (not implemented).

eilimit

Determines the maximum number of pages to return. The default is 10.

List: imagelinks

Parameter

Value

ilnamespace

The number of the namespace from which the list should be derived. It should be a value from 0 to 15 (unless you've added custom namespaces).

illimit

Determines the maximum number of pages to return. The default is 10.

List: usercontribs

Parameter

Value

uclimit

Determines the maximum number of contributions to return. The default is 10.

ucstart

The timestamp of the starting point of the list of user contributions that will be returned.

ucend

The timestamp of the ending point of the list of user contributions that will be returned.

ucuser

The username whose contributions will be returned

ucdir

Determines the sort direction of the returned list of user contributions, either from older to newer or newer to older. The value is either older or newer.

ApiRequest.doListQuery(list=allpages, **listargs)

The Python method used to generate list queries is similar to the one used to generate titles queries. This generic query can be used for any kind of list type:

def doListQuery(self, list, format, **args):
        args.update({
        "action": "query",
        "list": list,
        "format": format}
        )
        f = self.execute(args)
        return f

No programmer likes to type any more than they have to, so slightly more convenient methods have been included for each specific kind of list to be queried:

def doListAllpagesQuery(self, **args):
        args.update({
        "action":"query",
        "list": "allpages",
        })
        f = self.execute(args)
        return f
def doListLogeventsQuery(self, **args):
        args.update({
        "action":"query",
        "list": "logevents",
        })
        f = self.execute(args)
        return f

    def doListWatchlistQuery(self, **args):
        args.update({
        "action":"query",
        "list": "watchlist",
        })
        f = self.execute(args)
        return f

    def doListRecentchangesQuery(self, **args):
        args.update({
        "action":"query",
        "list": "recentchanges",
        })
        f = self.execute(args)
        return f

    def doListBacklinksQuery(self, **args):
        args.update({
        "action":"query",
        "list": "backlinks",
        })
        f = self.execute(args)
        return f

    def doListEmbeddedinQuery(self, **args):
        args.update({
        "action":"query",
        "list": "embeddedin",
        })
        f = self.execute(args)
        return f


    def doListImagelinksQuery(self, **args):
        args.update({
        "action":"query",
        "list": "imagelinks",
        })
        f = self.execute(args)
        return f

    def doListUsercontribsQuery(self, **args):
        args.update({
        "action":"query",
        "list": "usercontribs",
        })
        f = self.execute(args)
        return f

Generators

Earlier in this chapter, you learned that lists could be used as generators in place of titles, pageids, and revids queries. You also saw that this concept is most easily understood by looking at sample output, which is what you will see here.

In order to use a list as a generator, all you need to do is refer to it as a generator in the query. Instead of list=allpages, use generator=allpages, as illustrated in the following example:

api.php?generator=allpages&format=xml&action=query

That's all there is to it. The advantage to using a generator is that you then have access to the prop and revision parameters and can thus query a much richer set of information than you can with lists alone.

The following two API requests will return the same data, even though one is a generator and the other is a list:

api.php?action=query&format=xml&generator=allpages
api.php?action=query&format=xml&list=allpages

Both of these requests return the following data:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query-continue>
        <allpages gapfrom="Image galleries"/>
    </query-continue>
    <query>
        <pages>
            <page pageid="49" ns="0" title="ASamplePage"/>
            <page pageid="28" ns="0" title="A new page"/>
            <page pageid="33" ns="0" title="Basic Image Links"/>
            <page pageid="34" ns="0" title="Basic Media Namespace Links"/>
            <page pageid="41" ns="0" title="College Basketball"/>
            <page pageid="42" ns="0" title="College Football"/>
            <page pageid="39" ns="0" title="College Sports"/>
            <page pageid="20" ns="0" title="Core parser functions"/>
            <page pageid="19" ns="0" title="Headings"/>
            <page pageid="35" ns="0" title="Image Alignment"/>
        </pages>
    </query>
</api>

The difference becomes apparent when you use both the generator and the prop parameter:

api.php?action=query&format=xml&generator=allpages&prop=revisions

This request returns the following data:

<?xml version="1.0" encoding="utf-8"?>
<api>
    <query-continue>
        <allpages gapfrom="Image galleries"/>
    </query-continue>
<query>
        <pages>
            <page pageid="49" ns="0" title="ASamplePage">
                <revisions>
                    <rev revid="164" pageid="49" oldid="155"/>
                </revisions>
            </page>
            <page pageid="28" ns="0" title="A new page">
                <revisions>
                    <rev revid="111" pageid="28" oldid="102" minor=""/>
                </revisions>
            </page>
            <page pageid="33" ns="0" title="Basic Image Links">
                <revisions>
                    <rev revid="121" pageid="33" oldid="112"/>
                </revisions>
            </page>
            <page pageid="34" ns="0" title="Basic Media Namespace Links">
                <revisions>
                    <rev revid="123" pageid="34" oldid="114"/>
                </revisions>
            </page>
            <page pageid="41" ns="0" title="College Basketball">
                <revisions>
                    <rev revid="138" pageid="41" oldid="129"/>
                </revisions>
            </page>
            <page pageid="42" ns="0" title="College Football">
                <revisions>
                    <rev revid="139" pageid="42" oldid="130"/>
                </revisions>
            </page>
            <page pageid="39" ns="0" title="College Sports">
                <revisions>
                    <rev revid="143" pageid="39" oldid="134"/>
                </revisions>
            </page>
            <page pageid="20" ns="0" title="Core parser functions">
                <revisions>
                    <rev revid="160" pageid="20" oldid="151"/>
                </revisions>
            </page>
            <page pageid="19" ns="0" title="Headings">
                <revisions>
                    <rev revid="59" pageid="19" oldid="56"/>
                </revisions>
            </page>
            <page pageid="35" ns="0" title="Image Alignment">
                <revisions>
                    <rev revid="127" pageid="35" oldid="118"/>
                </revisions>
            </page>
        </pages>
    </query>
</api>

There are, of course, a large number of variations to the kind of requests that can be made this way, and all of the revision properties can be used as well to construct complex queries.

ApiRequest.doGeneratorQuery()

The Python method to request a generator is almost identical to the list request, except that the generator parameter is used instead of the list parameter:

def doGeneratorQuery(self, list, format, **args):
        args.update({
        "action": "query",
        "generator": list,
        "format": format}
        )
        f = self.execute(args)
        return f

This method can be used to replicate the queries used to illustrate generator output by using them in the following way:

api = ApiQuery()
f = api.doGeneratorQuery("allpages", "xml")
    print f.read()

f = api.doGeneratorQuery("allpages", "xml", prop="revisions")
    print f.read()

In Development

One feature missing from the API is the capability to edit pages programmatically. This feature is currently under active development and will be available in future versions of MediaWiki.

api.py

The complete code of the api.py script follows:

#!/usr/bin/env python
# encoding: utf-8
" " "
api.py

Created by Mark on 2007-08-06.
Copyright (c) 2007 The Choate Group, LLC. All rights reserved.
" " "

import sys
import os
import urllib
import urllib2
import cookielib
import xml.etree.ElementTree
import StringIO
# Customize the following values for your wiki installation
QUERY_URL = u"http://127.0.0.1/mysql/api.php"
HEADERS = {"User-Agent"  : "API Test/1.0"}
COOKIEFILE = "/Users/mchoate/Documents/Code/Metaserve/MediaWiki/test.cookie"


class ApiRequest:
    " " "
    Encapsulates the HTTP request to MediaWiki, managing cookies and
    handling the creation of the necessary URLs.
    " " "

    def _initCookieJar(self):
        " " "
        The LWPCookieJar class saves cookies in a format compatible with
        libwww-perl, which looks like this:

        #LWP-Cookies-2.0
        Set-Cookie3: wikidb_profwiki_Token=8ade58c0ee4b60180ab7214a93403554;
   path="/"; domain="127.0.0.1"; path_spec; expires="2007-09-08 22:36:14Z";
   version=0
        Set-Cookie3: wikidb_profwiki_UserID=3;
   path="/"; domain="127.0.0.1"; path_spec; expires="2007-09-08 22:36:14Z";
   version=0
        Set-Cookie3: wikidb_profwiki_UserName=Mchoate;
   path="/"; domain="127.0.0.1"; path_spec; expires="2007-09-08 22:36:14Z";
   version=0

        " " "
        cj = cookielib.LWPCookieJar()
        # If the cookie file exists, then load the cookie into the cookie jar.
        if os.path.exists(COOKIEFILE):
            cj.load(COOKIEFILE)
        # Create an opened for urllib2. This means that the cookie jar
        # will be used by urllib2 when making HTTP requests.
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)
        return cj

    def _saveCookieJar(self,cj):
        cj.save(COOKIEFILE)


    def doHelp(self, format="xml"):
        args={"action": "help",
            "format": format}
        f = self.execute(args)
        return f

    def doLogin(self, name, password, domain="", format="xml"):
        " " "
        The login action is used to login. If successful, a cookie
        is set, and an authentication token is returned.
Example:

          api.php?action=login&lgname=user&lgpassword=password
        " " "
        args={
            "action"   : "login",
            "format"    : format,
            "lgname"    : name,
            "lgpassword": password,
        }
        # The domain is optional
        if domain:
            args.update({"lgdomain":domain})

        # MediaWiki returns an XML document with a blank line at
        # the top, which causes an error while parsing. The
        # following code strips whitespace at the front and
        # back of the XML document and returns a string.
        s = self.execute(args).read().strip()

        # ElementTree expects a file-like object,
        # so one is created for it.
        f = StringIO.StringIO(s)
        root = xml.etree.ElementTree.parse(f).getroot()

        # The root element is the <api> element.
        login = root.find("login")

        # The <login> element has an attribute 'result'
        # that returns 'Success' is the login was successful
        test = login.attrib["result"]
        if test == "Success":
            return True
        else:
            return False


    def doOpenSearch(self, search="", format="xml"):
        args={
              "action"   : "search",
              "format"    : format
        }
        f = self.execute(args)

    def doFeedWatchList(self, feedformat="rss"):
        args={
              "action"   : "feedwatchlist",
              "feedformat": feedformat,
        }
        f = self.execute(args)
        return f
def doQuery(self, **args):
        return self.execute(args)

    def doTitlesQuery(self, titles="Main Page", prop="info", meta="siteinfo", format="xml"):
        args={
        "action": "query",
        "titles": titles,
        "prop": prop,
        "meta": meta,
        "format": format
        }
        f = self.execute(args)
        return f


    def doTitlesQueryNoMeta(self, titles="Main Page", prop="info", format="xml"):
        args={
        "action": "query",
        "titles": titles,
        "prop": prop,
        "format": format
        }
        f = self.execute(args)
        return f


    def doSimpleTitlesQuery(self, titles="Main Page", format="xml"):
        args={
        "action": "query",
        "titles": titles,
        "format": format
        }
        f = self.execute(args)
        return f

    def doTitlesQuery2(self, titles="Main Page",
   rvprop="timestamp|user|comment", rvlimit="50", rvdir="forward", format="xml"):
        args={
        "action": "query",
        "titles": titles,
        "prop": "revisions",
        "rvprop": rvprop, #timestamp|user|comment|content
        "rvlimit": rvlimit,
        #"rvstartid": "77",
        #"rvendid": "200",
        #"rvstart": rvstart, #timestamp
        #"rvend": rvend, #timestamp
        "rvdir": rvdir, #newer|older
        "format": format
        }
        f = self.execute(args)
        return f
def doTitlesQuery3(self, titles="Main Page",
   rvprop="timestamp|user|comment", rvlimit="50", rvdir="older", format="xml"):
        args={
        "action": "query",
        "titles": titles,
        "prop": "revisions",
        "rvprop": rvprop, #timestamp|user|comment|content
        "rvlimit": rvlimit,
        "rvstartid": "77",
        "rvendid": "150",
        #"rvstart": rvstart, #timestamp
        #"rvend": rvend, #timestamp
        "rvdir": rvdir, #newer|older
        "format": format
        }
        f = self.execute(args)
        return f

    def doGeneratorQuery2(self, list_="allpages",
   apfrom="aardvark", apnamespace="0", apfilterredir="all", aplimit="10",
   apprefix="",rvprop="timestamp|user|comment", format="xml"):
        args={
            "action": "query",
            "generator": list_,
            "prop": "revisions",
            "rvprop": rvprop, #timestamp|user|comment|content
            "apfrom":apfrom,
            "apnamespace":apnamespace,
            "apfilterredir": apfilterredir,
            "aplimit": aplimit,
            "apprefix": apprefix,
            "format": format
        }
        f = self.execute(args)
        return f

    def doGeneratorQuery(self, list, format, **args):
        args.update({
        "action": "query",
        "generator": list,
        "format": format}
        )
        f = self.execute(args)
        return f

    def doListQuery(self, list, format, **args):
        args.update({
        "action": "query",
        "list": list,
        "format": format}
        )
        f = self.execute(args)
        return f
def doListAllpagesQuery(self, apfrom="aardvark", apnamespace="0",
   apfilterredir="all", aplimit="10", apprefix="", format="xml"):
        args={
        "action":"query",
        "list": "allpages",
        "apfrom":apfrom,
        "apnamespace":apnamespace,
        "apfilterredir": apfilterredir,
        "aplimit": aplimit,
        "apprefix": apprefix,
        "prop":"revisions",
        "rvprop":"timestamp|user|comment",
        "format":format
        }
        f = self.execute(args)
        return f

    def doSimpleListAllpagesQuery(self, apfrom="A", apnamespace="0",
   apfilterredir="all", aplimit="10", apprefix="M", format="xml"):
        args={
        "action":"query",
        "list": "allpages",
        "apfrom":apfrom,
        "apnamespace":apnamespace,
        "apfilterredir": apfilterredir,
        "aplimit": aplimit,
        "apprefix": apprefix,
        #"prop":"revisions",# doesn't do anything for the list
        #"rvprop":"timestamp|user|comment",
        "format":format
        }
        f = self.execute(args)
        return f

    def doListLogeventsQuery(self, **args):
        args.update({
        "action":"query",
        "list": "logevents",
        })
        f = self.execute(args)
        return f

    def doListWatchlistQuery(self, **args):
        args.update({
        "action":"query",
        "list": "watchlist",
        })
        f = self.execute(args)
        return f

    def doListRecentchangesQuery(self, **args):
        args.update({
        "action":"query",
        "list": "recentchanges",
})
        f = self.execute(args)
        return f

    def doListBacklinksQuery(self, **args):
        args.update({
        "action":"query",
        "list": "backlinks",
        })
        f = self.execute(args)
        return f

    def doListEmbeddedinQuery(self, **args):
        args.update({
        "action":"query",
        "list": "embeddedin",
        })
        f = self.execute(args)
        return f

    def doListImagelinksQuery(self, **args):
        args.update({
        "action":"query",
        "list": "imagelinks",
        })
        f = self.execute(args)
        return f

    def doListUsercontribsQuery(self, **args):
        args.update({
        "action":"query",
        "list": "usercontribs",
        })
        f = self.execute(args)
        return f

    def execute(self, args):
        " " "
        This is a generate method called by the convenience methods.
        The request takes place in three stages. First, the cookie jar
        is initialized and the cookie file is loaded if it already exists. Then,
        the dictionary "args" is urlencoded and urllib2 generates the HTTP request.
        The result of the request is returned as a file-like object. Once it is
        received, the cookie data is saved so that it will be available for the
        next request, and the data is returned to the calling method.
        " " "
        cj = self._initCookieJar()
        req = urllib2.Request(QUERY_URL, urllib.urlencode(args), HEADERS)
        f = urllib2.urlopen(req)
        self._saveCookieJar(cj)
        return f


if __name__ == '__main__':
# Test methods
    api = ApiRequest()
    f = api.doHelp()

    if api.doLogin("Mchoate", "connor"):
        print "Login was successful.

"
    else:
        print "Login failed.

"
    print "--------------------------------------- 
"

    f = api.doTitlesQuery(titles="Main Page", prop="info", meta="siteinfo", format="xml")

    f = api.doTitlesQueryNoMeta(titles="Main Page", prop="info", format="xml")

    f = api.doTitlesQueryNoMeta(titles="Main Page", prop="revisions", format="xml")

    f = api.doSimpleTitlesQuery(titles="Main Page", format="xml")

    f = api.doTitlesQuery2(titles="Main Page", rvprop="timestamp|user|comment",
   rvlimit="10",rvdir="older", format="xml")

    f = api.doTitlesQuery3(titles="Main Page", rvprop="timestamp|user|comment",
   rvlimit="10",rvdir="older", format="xml")

    f = api.doTitlesQuery3(titles="Main Page", rvprop="timestamp|user|comment",
   rvlimit="10",rvdir="newer", format="xml")

    f = api.doGeneratorQuery2(list_="allpages", rvprop="timestamp|user|comment",
   format="xml")

    f = api.doListAllpagesQuery()

    f = api.doSimpleListAllpagesQuery()

    f = api.doListQuery("allpages", "xml")

    f = api.doGeneratorQuery("allpages", "xml")

    f = api.doGeneratorQuery("allpages", "xml", prop="revisions")

Summary

In this chapter, you learned how to configure and run sample scripts from the pywikipedia bot, as well as how to interact with the new MediaWiki API using Python. These tools can be used to automate certain administrative tasks and can save administrators a significant amount of time. Eventually, the full MediaWiki API will make it possible to create robust client applications for MediaWiki.

In the next chapter, you will learn about site maintenance and administration of your wiki, including performance management through caching.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.247.77