Subsections

5. Writing Optimized Compiled Code With SAGE

This chapter is about how to write fast compiled optimized code for SAGE. It is still unfinished.

This chapter assumes that you have a C compiler suite (e.g., GCC) installed on your computer, since you should not write compiled extension code on a computer without a C compiler. If you are using OS X, download and install the latest version of XCode.5.1 For Linux, use your package manager to install GCC. If you're using Microsoft Windows, then you should be using SAGE via colinux, which includes GCC by default.

Interpreted code: way easier to debug, trace, document, structure (multiple inheritence), etc.; good when few for loops; lots of structure and calls to compiled code.

Compiled code: two reasons to use--make fast code and make optimal use of libraries written in a compiled language; hard to write and debug; takes longer to write. The methods explained in this chapter can produce code where speed is only limited by your choice of algorithm.

TODO:

  1. Module name and file name must agree!

5.1 Introduction

This chapter describes how to write SAGE programs that use Pyrex. Pyrex is not used as widely as several other programs for writing fast compiled code for Python. One reason is because the official Pyrex documentation is not so good. Also there are a couple of shortcomings in the standard distribution of Pyrex.

The primary goal of this chapter is to provide excellent documentation about how to use Pyrex to write very fast code in the context of SAGE. This chapter is self contained in that the reader should not have to look at any of the documentation distributed with Pyrex. The reader is urged to at some point read the Python/C API Reference manual.

There is also much to be learned from the Pyrex mailing list [#!pyrex:list!#].

5.1.1 Interpreted and Compiled Code

Every serious computer algebra system, e.g., MAGMA, PARI, Mathematica, Maple, GAP, Singular, etc., is implemented as a combination of compiled and interpreted code. For Mathematica, Singular, MAGMA, and PARI, most of the implementation is in compiled C (or C++) code; some of these systems tend to be very optimized (subject to the constraints of the algorithms they implement). In contrast, Maple and GAP have a relatively small compiled ``kernel'' that defines the underlying programming language; most of the system is then implemented in this language. If you do benchmarks you'll discover that Mathematica is much faster than Maple at some basic operations. Likewise, benchmarks reveal that MAGMA is often faster than GAP.

This fusion of interpreted and compiled code is extremely natural for mathematics software; some algorithms are much better implemented in an interpreter because all time-critical steps involve low level arithmetic -- other algorithms, e.g., matrix multiplication, must be implemented in a compiled language in order to perform optimally. Also, existing compiled libraries are sometimes of very high quality, and a compiled language is needed to create the best possible interface to such libraries. It's crucial that both approaches to programming be fully supported. When deciding how to implement SAGE, I searched for months for an environment that could support both approaches. Python and Pyrex provide exactly this in a way that I believe is much easier conceptually than the implementation models of any of the other systems mentioned above.

5.1.2 Why Pyrex is the Best Available Option for SAGE

The software listed in the title of this section is used either to speed up Python programs or to make code written in C++, C, FORTRAN, etc., available in Python. This section briefly explains what each tool does and why it is not included in SAGE or used in the implementation of the SAGE library.

5.1.2.1 SWIG

SWIG (Standard Wrapper and Interface Generator) takes C++ (and C, etc.) libraries, and generates wrapper code so that the code can be used from within Python (and in several other interpreters). One writes interface files that describe how to make Python objects usable from your C++ code, and conversely what transformations C++ objects should undergo to be usable from Python.

In 2004, Stein considered using SWIG for SAGE extensively, and implemented several basic types using this model. The idea was to write code in C++ for SAGE that needed to be fast, then wrap it in SWIG. This ground to a halt, because the result was not sufficiently fast. First, there is overhead when writing code in C++ in the first place. Second, SWIG generates several layers of code between Python and the code that does the actual work; for implementing any of SAGE's numerous basic arithmetic type, e.g., integers, the overhead is completely unacceptable, at least if one wants code that is as fast as Magma and PARI. For example, in my initial implementation of integer arithmetic using SWIG and GMP, there were the following six levels between the SAGE interpreter and the GMP C library:

Python code to provide a clean interface
SWIG Autogenerated Python code
SWIG Autogenerated C++ extension code
Handcode C++ Integer class
GMP's C++ Interface
GMP C Library
In this model, just multiplying two integers is extremely complicated, and slow. Moreover, writing the SWIG interface files was painful, since general Python types are difficult to work with from C++ code. In contrast, with Pyrex the situation is as follows (see integer.pyx):
Pyrex generated C code that provides everything needed
GMP C Library
Another important problem with using SWIG is that developers have to know C++ well and have the mental dexterity to easily switch back and forth between C++ and Python. Stein does not have this ability.

5.1.2.2 Boost

Like SWIG, Boost takes C++ code and generates C/C++ extension code and Python wrappers around that extension code, which one can then use from the Python interpreter. SAGE does not use Boost because it is large and difficult to redistribute and its meant much more for wrapping existing C++ libraries. Boost also uses a very complicated and specialized build system.

5.1.2.3 Psycho

Psycho is a Python library that tries to guess the types of variables in a running Python program, then generate compiled extension code on the fly to optimize the running program. SAGE does not use Psycho, because it only runs on Intel x86 machines it has large and unpredictable memory requirements, and so much behind-the-scenes guessing at this level for a serious research mathematics tool is inappropriate.

Moreover, in my own experience with SAGE, the speedups from Psycho were not significant, except on code that had an obvious C equivalent. Such code can be easily rewritten in Pyrex, which always results in something that is faster, more robust, and more reliable than what Pyscho produces, and which can be debugged using gdb.

Though Psycho is not included with or used by SAGE, it is possible to try out in Psycho by installing it and putting the appropriate line at the top of a file.

5.1.2.4 Ctypes

Ctypes is included standard in Python 2.5, so it is clearly useful to many Python users. It allows one to call library functions defined in shared object libraries directory from interpreted Python code. For the core of SAGE we have not found a use for this yet. First, when implementing a basic type that needs to be very fast, e.g., GMP integers, the entire implementation absolutely must be compiled - writing it as a combination of Python and C library calls will be way too slow. Second, it is often very painful to convert Python data structures to C data structures and conversely, and there are many opportunities for memory leaks and core dumps.

5.2 Getting Going

5.2.1 Attaching or Loading .spyx Files

The easiest way to try out Pyrex without having to learn anything about distutils, etc., is to create a file with the extension spyx, which stands for ``SAGE Pyrex''. Try it now!

5.2.1.1 Using the Command Line and a File

  1. Create a file hello.spyx.
  2. Put the following in it:
    def hello(name):
        """
        Print hello with the given name. 
        """
        print("Hello %s"%name)
    
  3. Start the SAGE command-line interpreter and load the spyx file (this will fail if you do not have a C compiler installed).
    sage: load "hello.spyx"
    Compiling hello.spyx...
    sage: hello('World')
    Hello World
    
Note that you can change hello.spyx, then load it again and it will be recompiled on the fly. You can also attach hello.spyx so it is reloaded whenever you make changes.
sage: attach "hello.spyx"

5.2.1.2 Using the SAGE Notebook

Create an input cell that contains

5.2.1.3 An Optimization Example

Imagine you are writing a program that frequently computes a*b + c for $ a,b,c$ SAGE integers. For example, you want to do the following computation (timings on a Macbook Pro 2Ghz under OS X):
sage: a = 939393; b = 19393; c = 39390283
sage: time for _ in range(10000): a = a*b + c
CPU times: user 0.33 s, sys: 0.21 s, total: 0.53 s

There is no ``multiply then add'' operation built into SAGE, and you're very frustrated because you really want this to be as fast as possible. There is possibly significant overhead in the computation of a*b + c in Python.

First create a new compiled ``multiply and add'' command in SAGE.

cimport sage.ext.integer
def muladd(sage.ext.integer.Integer a,
           sage.ext.integer.Integer b,
           sage.ext.integer.Integer c):
    """
    Compute a*b + c
    """
    cdef sage.ext.integer.Integer d
    d = sage.ext.integer.Integer()
    mpz_mul(d.value, a.value, b.value)
    mpz_add(d.value, d.value, c.value)
    return d

Do this either by putting the above code in a spyx file or pasting it into the notebook in a cell that starts with %spyx. (See Section  above.)

Now we can do the above computation more quickly:

sage: a = 939393; b = 19393; c = 39390283
sage: time for _ in range(10000): a = muladd(a,b,c)
CPU times: user 0.11 s, sys: 0.05 s, total: 0.15 s

To squeeze out even more performance we can put the entire loop in single function.

def muladdloop(sage.ext.integer.Integer a,
	       sage.ext.integer.Integer b,
	       sage.ext.integer.Integer c, 
	       int n):
    cdef int i
    cdef mpz_t t
    cdef sage.ext.integer.Integer aa
    aa = sage.ext.integer.Integer(a)
    mpz_init(t)

    for i from 0 <= i < n:
        mpz_mul(t, aa.value, b.value)
        mpz_add(aa.value, t, c.value)

    mpz_clear(t)
    return aa

We then have

sage: a = 939393; b = 19393; c = 39390283
sage: time z=muladdloop(a,b,c,10000)
CPU times: user 0.11 s, sys: 0.00 s, total: 0.11 s
This is a little faster.

5.2.2 A simple loop example: Sum of squares

The following SAGE worksheet illustrates the above sums of squares examples in the SAGE Notebook.
%pyrex
two = int(2)
def sumsquarespy(n):
    return sum(i**two for i in xrange(1,n+1)) 
#<
time v=[sumsquarespy(100) for _ in xrange(10000)]
#> CPU time: 0.49 s,  Wall time: 0.49 s
def sumsquares(int n):
    cdef int i, j
    j = 0
    for i from 1 <= i <= n:
        j = j + i*i
    return j
#<
time v=[sumsquares(100) for _ in xrange(10000)]
#> CPU time: 0.06 s,  Wall time: 0.06 s

5.2.3 Adding to SAGE

5.2.4 Your Own setup.py

5.3 Pyrex Manual

5.3.1 Public and Private Variables

Test Drive wrote:
> I have tried "cdef 
> object j", however in that case users will not be able to use the j when 
> user will create an instance of the object Test.
 
You want 'cdef public int j' to expose it to Python
code. Or 'cdef readonly int j' if you don't want
them to be able to write to it.
 
Like builtin types, Pyrex extension types don't have
a dict, so you can't assign arbitrary attributes to
them.

5.4 Gotchas

5.4.1 How C extension classes different from Python classes

5.4.2 When to use cdef class versuses def class in Pyrex

def allows multiple inheritance, but no c-level data structures. Maybe creation is faster. What exactly is the deal here?!

5.4.3 SAGE's Pyrex is Patched

5.4.4 There is no +=

5.4.5 Notation for C for Loops

5.4.6 Typos in Function Names and Variables

5.4.7 Doctests

How they work for Pyrex files.

5.4.8 Signal Handling

How it works.

very powerful, but use with care.

evidently there is bad overhead with them on OS X. not so bad in linux.

5.4.9 Introspection

Users currently can't view the source code of functions from the command line. This must change. Ideas for fixing this problem.

If we can find the module name where foo is defined, then check if the appropriate foo.pyx file is available. Load it. Find the function. Figure out which if it is defined twice. Show all if we can't figure it out.

If totally screwed and we can't even find the file, then modify pyrex so it embeds extra info in the docstring. Strip that out on displaying.

5.5 Automatic Coercion to and From C Data Types

Integer literals

type checking

Etc.

How to do explicit casts

5.6 getslice

For fastest speed, use this declaration:
def __getslice__(self, Py_ssize_t i, Py_ssize_t j):
    ...

5.7 The Python/C API

The object type.

5.8 C Header Files

5.9 Writing Your Code in a Mix of Pyrex and C/C++

5.10 Splitting Implementations Across Multiple Files

5.11 Wrapping C++ Code

Pyrex code compiles with C++ compiler. Using pyrexembed.

5.12 Other Optimization Tricks

5.13 Misc

5.13.1 Class methods

In a cdef'd class this is not possible at all.

5.13.2 List Comprehensions

List comprehension in Pyrex is possible. Should this be moved into Pyrex somehow...?
def addit(v, s):
    """
    EXAMPLE: 
        sage: load a.spyx
        Compiling a.spyx...
        sage: addit([1,2,3], 5)
        [5, 10, 15]
    """
    return eval('[x*s for x in v]',  {'v':v, 's':s})



Footnotes

... XCode.5.1
The download is about 1GB, and you must sign up for the Apple Developer Network.
See About this document... for information on suggesting changes.