This is an archival dump of old wiki content --- see scipy.org for current material

Mlabwrap is a high-level python-to-matlab bridge. This wikipage is currently intended for developers and not for users. Please have a look at the sourceforge page to get a quick overview.

Infrastructure

Design Goals

Mlabwrap strives to

  1. be painless to install and use (and upgrade)
  2. (where possible) come close to giving the user the impression to be using a python library rather than calling into a different language
  3. meet the needs of NIPY

One reason for 2. (rather than say striving to emulate the look and feel of matlab as closely as possible) is that it should make it easier to gradually replace legacy matlab code with python code and for python (only) programmers to maintain and understand code that uses mlabwrap to leverage an existing matlab code base. The downside is that there is enough semantic difference between matlab and python (see below) to make this goal fully attainable -- trade-offs will have to be made.

I think it's therefore important that these trade-offs are informed by actual usage patterns, which is why 3. is likely to be vital for arriving at a good design; deriving things from first principles is unlikely to be an adequate substitute.

Principles

Some concrete goals and plans for v. 1.0 and immediate future

Compatibility

Installation

Painless Install:

Limitations and issue with current design and implementation

Differences between numpy and matlab that complicate bridging

more ops
python lacks the following:
  • *, /, ^ (instead offering .*, ./, .^)

  • \, .\

  • ', .'

  • ,, ;

  • {}, subsindex

  • kron

  • any all

  • set ops: union (could use '|'), unique, 'intersect' (could use

    • '&'), 'setdiff' , 'setxor' , 'ismember' (could use in, but there's a 3 argument version, too)

  • some bit-ops: bitget, bitset, the binary version of bitcmp

  • size (could map to .shape); not quite sure whether numel is somehow

    • relevant for mlabwrap
fastest varying index
matlab is column major, python is row major; apart from performance penalties when converting this also has implications for how data is preferentially arranged. This also interacts with dimensionality, see below.
dimensionality
I can think of two sane and internally consistent ways to handle array dimensionality:
intrinsic dimensionality
In numpy dimensionality is intrinsic to an array

(e.g. a = array(1) has dimensionality, i.e. numpy.ndims(a), 0 and a[0,0,0,0] will throw an error).

sane context dependent dimensionality (scdd)
In matlab dimensionality is

context dependent (e.g. a=1; a(1,1,1,1) will work fine). A sane way to have done this would be to conceptualize everything as an array with an infinite number of leading (or trailing; if one desires column major) unit dimensions and determine the desired actual dimensionality by context (ignoring leading unit dimensions by default). In other words, under that scheme 1, [1], [[1]], [[[1]]] are all the same object with the same physical representation and when context doesn't determine the dimensionality (e.g. the number of subscripts when indexing), one assumes by default the dimensionality of the arrays sans leading(/trailing) unit dimensions. The (IMO minor) advantage of this scheme is that it is sometimes convenient to regard on and the same object as e.g. a scalar or a 1x1 matrix, depending on context (as is often done in math). The (IMO major) disadvantage is that one looses the ability to regard arrays as nested container types (e.g. in numpy a[0] is legal and has an obvious meaning for any a with non-zero dimensions and it holds that ndims(a[0]) == ndims(a)-1. But this equality doesn't hold in scdd and without some arbitrary convention (such as matlab's flat-indexing) a[0] is not even meaningful when there is more than one non-unit dimension).

Of course matlab being matlab doesn't implement either of these schemes

opting for something more messy instead: I think the idea basically is that everything is a matrix, unless it has too many (non-unit trailing) dimensions. As an example, in the 'sane context dependent dimensionality' scheme detailed above ndims(a) would be []; in matlab it is 2, as is ndims(ones(1,1,1)), ndims(ones(2,1,1)), but not ndims(ones(1,1,2)) which is 3. Another annoyance is that matlab isn't really consistent in its column-major vantage point: flats (i.e. x(:)) are column vectors, but very basic commands like linspace and the : operator return row vectors -- in other words, unlike in numpy there is no 'canonical' vector type; further complicating DWIM conversion attempts.

dereferencing, nullary-call and indexing
  • Matlab doesn't syntactically distinguish between dereferencing a variable and calling a

    nullary function -- a in matlab can mean a in python, or equally a(). Similarly function call and indexing are syntactically indistinguishable; a(1) could be either a(1) or a[0] in python. One issue where this comes up is determining how mlab.some_var ought to behave.

indexing and attribute access and modification
`{}` vs `()`

currently done with the _ hack; TODO maybe add a way to associate x[key] with {} indexing when x belongs to certain set of classes.

`subsref` and `subsasgn` are non-recursive
By that I mean that whears in

python (only) x.y' gets to handle the attribute access to z in x.y.z`, in matlab it's actually x. Python's behavior becomes an issue with the current (i.e. mlabwrap 1.0) scheme for assigning to.

1-based indexing and `end` arithmetic

(Aside end is pretty nasty; you'd think matlab might just call size to figure out the end for a given dimension, but not so, unless you define your own end method, it just silently uses '1', however not for the a(:) syntax, which one might erroneously assume to resolve to something like a(1:end). Interestingly although I know how to define an end method I haven't figured out how to call it directly...).

strict slices (matlab) vs permissive ones (python)

[][1:1000] will work fine in python (for arrays and lists), but not in matlab. Although it would be possible to try to hide this (e.g. by doing something like thing(min(slice_start,end),min(slice_end,end)), it's presumably not worth the trouble.

dtypes

Although matlab has logical (bool) {u,}int{8,16,32,64} as well as single, double and char arrays and therefore a pretty good correspondence to the available numpy dtypes (apart from the fact that {single,double}-floats are conflated with {single,double}-complex floats), the mapping is complicated by the fact that double takes a very dominant role in matlab (e.g. IIRC the various int types only recently grew even the standard arithmetic operators and have hence largely only been used to represent integral values when using doubles was somehow too expensive or otherwise impossible). Currently mlabwrap "solves" this by just converting everything (save strings) to double, but with matlab's growing emancipation of non- 64-bit float matrix datatypes, this may become a less attractive trade-noff.

call-by-value, copy-on write (matlab) vs. proper object identity (python)
multiple value return

not much of an issue (the nout arg seems to handle this fine)

broadcasting
an issue for mixed python/matlab object operations?

Marshaling vs. Proxying

Mlabwrap currently uses two different approaches to make matlab objects available in python: marshaling (i.e. creating an 'equivalent' native python object from a temporary matlab-original) for types (currently matrices and strings)) and proxying (creating a proxy object that delegates to the matlab object, which is kept around).

Each method has it's downsides and upsides and the key design questions for the next version of mlabwrap is what mix of the two mlabwrap should employ, how the user can fine-tune it. Here's some of the issues with various scenarios:

Problems with a pure-marshaling scheme

Problems with a pure-proxying scheme

Hybrid-scheme 1: pretty much as currently - proxy only objects (and possibly structs), but not arrays, strings etc.

Hybrid-scheme 2: Also proxy object attributes (on access), even if they could be marshaled (largely so that subattribute assignment can be made to work)

(FIXME expand this section)

Problems with hybrid-scheme 1

Problems with hybrid-scheme 2

Summary: I think pure marshaling ought to be available as option (an maybe pure-proxying, too), but I suspect that a hybrid approach will make the best default.

Hacks at our disposal to fine-tune (conversion etc.) behavior

Generally speaking Matlab and python are semantically too different for mlabwrap to be able to offer default behavior that always produces the desired or expected results. Here are ways how fine-tuning behavior is or could be implemented

As hinted above too much flexibility can be a bad thing; the less customization possibilities the design needs whilst retaining generality and convenience for the common cases, the better.

Open questions

SciPy: MlabWrap (last edited 2015-10-24 17:48:26 by anonymous)