Differences between revisions 13 and 14
|Deletions are marked like this.||Additions are marked like this.|
|Line 6:||Line 6:|
Currently this page reflects the vision of KeirMierle, and not necessarily the community as a whole. By integrating consensus from mailing list discussions, I will refine and polish this vision and form a plan of action such that the community can move the numpy+scipy+ipython+matplotlib ensemble closer to the vision outlined below.
THIS IS NOT COMPLETE YET
See the following post for further discussion of the difference between the vision for a new PyLab expressed on this page, and the existing pylab package which is part of matplotlib:
The PyLab Vision
To make PyLab an easy to use, well packaged, well integrated, and well documented, numeric computation environment so compelling that instead of having people go to Python and discovering that it is suitable for numeric computation, they will find PyLab first and then fall in love with Python.
The philosophy behind this vision is to consider Rails and Ruby; while Ruby was somewhat popular beforehand, it was Rails which propelled it to the forefront.
At the moment, the current combination of Python, NumPy, SciPy, Matplotlib, and IPython provide a compelling environment for numerical analysis and computation. Unfortunately, for those who are not already familiar with Python and the intricacies of how to build your own Python environment, or for those not familiar with the details of how there are conflicting names exported by different modules, or how the best list of NumPy examples is found on the wiki in a non-obvious place (and that the docstrings are not the best documentation), or that the speed of linear algebra operators is dependent on a carefully compiled combination of LAPACK, ATLAS, and Goto BLAS, or a host of other reasons (some outlined below), the picture is not nearly so rosy.
API Consistency - Create an official API for the PyLab system such that there is an official way to import the PyLab packages, and such that there are not multiple functions with very similar names in different packages.
Installation - Make the installation process trivial, especially for, e.g. people without root access or spare time.
A simple user story
Joe is frustrated with Matlab, because he finds it is slow when running his neural network experiments. He hears about PyLab from a friend, who recommends it as an alternative.
He finds PyLab as the first search result on Google
He finds a page with minimal clutter, showing a couple pictures of PyLab, and a direct download link to a binary for his operating system. He notes that pylab.org must have determined he is running Linux automatically. The page also has a small number of big, clear links to promotional materials (screencasts, testimonials), documentation, and community information (how to get involved).
- After downloading, the program installs with no hassles, and Joe can launch Pylab by typing 'pylab' and pressing enter in a terminal. Joe is happy that there was no hassle over dependencies on his older university computer, and that installing directly into his home directory (he does not have root access on the university computers) is not a problem.
PyLab notices that it is the first time it is run, and suggests he read the tutorial, and provides a link.
Joe clicks the tutorial link, which his terminal automatically pops up in a browser. The tutorial covers the basics of PyLab, explaining some of the philosophy. The tutorial is clearly written, and covers the basics of array computation and 2D graphing.
When Joe is implementing code, he finds the interactive help invaluable, provided by typing any object or function with a '?' after it in the interactive prompt. The documentation has copious examples and helpful pointers to other functions which may be useful (See also:).
Joe implements and runs his neural network simulations, and manages to speed them up by using one of the several methods of optimizing computation in the tutorial. He is so pleased with the results he suggests to his instructor that the entire class should switch to PyLab, as it is free and as far as Joe can tell, superior in almost every way to MATLAB.
There are some details omitted here (such as in step 3, does Joe untar the downloaded file or is it an executable?), but those are not the point. The point is that PyLab is a compelling, integrated, usable and superior alternative to MATLAB.
Why the PyLab name? Isn't that already taken by Matplotlib?
PyLab should be the name of the entire suite, and I feel strongly that the correct way to import the entire core PyLab API should be via
from pylab import *
This should include the core parts of numpy, scipy, and matplotlib. This should also be the default namespace set up when the program is launched interactively via 'pylab'. Whether the other components (such as numpy.linalg.*) should be included in this import is up for debate.
1. Revamping the Documentation
For now this section only talks about docstrings, and ignores the other forms of documentation (tutorial, guide, etc). If you want to fill out this section please do!
Available modules in docstrings
Examples in docstrings are extremely valuable. However, it is currently never the case that the docstrings in either NumPy or SciPy use any of the functionality offered by matplotlib. This in unfortunate, especially in the case of SciPy, because often the clearest way of demonstrating a function is to plot something.
While it is true that there is an argument which says that SciPy should not become dependent on matplotlib, it appears that this dependence already exists for all intents and purposes. It is likely that only in the most extreme cases one would want to use SciPy without matplotlib. Furthermore, the dependency would only be in the case where a user is executing docstrings in the interactive interpreter; in this case, it is highly likely that the user is doing something which requires some sort of plotting package regardless.
2. Fixing API consistency
The current PyLab ensemble has issues with API consistency, mainly stemming from Matplotlib's compatability layer. For example, consider the load function, which exists in both NumPy and Matplotlib:
>>> from pylab import * >>> load.__doc__ '\n Load ASCII data from fname into an array and return the array.\n\n ....' >>> from numpy import * >>> load.__doc__ 'Wrapper around cPickle.load which accepts either a file-like object or\n a filename.\n '
This function isn't matched! They are clearly different, yet each accepts a filename, and each will break in mysterious ways when one is expecting the functionality of the other. This is especially hard to track down when one is editing a script and running it where the script's import order causes numpy.load to be present, but in the interactive terminal the user has open pylab.load is the exposed function. This is bad.
Another example is the confusion around the min and max functions overwriting the builtins, then breaking in weird and unexpected ways. It appears this is now sorted out, with numpy.amax and numpy.amin being the array versions, with numpy.min and numpy.max new names for numpy.amin/amax. I feel as though from numpy import * should import min and max, but import a min and a max that throw an exception!
Here's a list of conflicts between SciPy and Matplotlib:
1 In : import pylab 2 In : import scipy 3 In : p=set(dir(pylab)) 4 In : s=set(dir(scipy)) 5 In : k=p.intersection(s) 6 In : conflicts = [f for f in k if getattr(scipy, f) is not getattr(pylab, f)] 7 In : import matplotlib 8 In : matplotlib.__version__ 9 Out: '0.87.7' <--- this was a pre 0.90 svn IIRC 10 In : conflicts 11 Out: 'cumsum' 12 In : pylab.cumsum? 13 Type: function 14 Base Class: <type 'function'> 15 String Form: <function cumsum at 0xb6731924> 16 Namespace: Interactive 17 File: /usr/lib/python2.4/site-packages/numpy/oldnumeric/functions.py 18 Definition: pylab.cumsum(x, axis=0) 19 Docstring: 20 <no docstring> 21 22 In : scipy.cumsum? 23 Type: function 24 Base Class: <type 'function'> 25 String Form: <function cumsum at 0xb774d5a4> 26 Namespace: Interactive 27 File: /usr/lib/python2.4/site-packages/numpy/core/fromnumeric.py 28 Definition: scipy.cumsum(x, axis=None, dtype=None, out=None) 29 Docstring: 30 Sum the array over the given axis. 31 32 In : conflicts 33 Out: ['cumsum', 'ptp', 'fix', 'ravel', '__file__', 'ones', 'rank', 'tri', 34 'insert', 'arange', 'indices', 'loads', 'where', 'mean', 'argmax', 'nonzero', 35 'asarray', 'sum', 'polyfit', 'prod', 'log2', 'power', 'cumproduct', 'corrcoef', 36 'meshgrid', '__name__', 'cov', 'cumprod', 'vander', 'arccos', 'load', 'array', 37 'iterable', 'eye', 'log', 'sometrue', 'alltrue', 'zeros', 'log10', '__doc__', 38 'empty', 'polyval', 'arcsin', 'arctanh', 'linspace', 'typecodes', 'copy', 39 'std', 'fromfunction', 'argmin', 'trapz', 'binary_repr', 'sqrt', 'take', 40 'product', 'repeat', 'trace', 'compress', 'array2string', 'amax', 'identity', 41 'amin', 'fromstring', 'average', 'base_repr', 'reshape']
Most of these are via oldnumeric, but not all. Either way, all oldnumeric functions exposed via pylab shouldn't be.
The single import statement
What most users want is for a single import statement to get a consistent set of packages which fulfil most of their needs. This should consist of:
from pylab import *
1 from pylab import * 2 from numpy import * 3 from scipy import *
But there are so many names!
Not really. from scipy import * brings in about 20 subpackages (i.e. signal such that you still need to do signal.ifft, but not scipy.signal.ifft) and only 15 new symbols.
How to fix the API
3. Fixing installation process
The installation process has certainly gotten easier over the years; however, the packages in PyLab core should be bundled in a cohesive whole so that the user is not even aware that there are several packages beneath the PyLab label. Certainly, for the busy undergrad who needs to get his signal processing homework done but can't afford MATLAB and doesn't have root access on his university computers it makes sense to have a monolithic binary which bundles everything (ala SAGE).
4. Fixing the Build process
Building a basic PyLab setup is straightforward on Debian and Ubuntu thanks to package management, provided one is capable of fishing around for the required build packages. However, a straight forward
python setup.py build
will not produce the most optimized executable; in order to get a highly optimized PyLab system, an elaborate dance is required to compile the BLAS, then Goto BLAS, then LAPACK with that BLAS, then replace a subset of LAPACK with optimized versions from ATLAS. The whole process is entirely unintuitive and, as far as the author can tell, not clearly documented anywhere, even for Linux.
Becoming a better foundation for SAGE
There is a package called SAGE which aims for almost exactly the same goals as PyLab. However, it is even more extreme than the PyLab vision outlined here, because SAGE includes many third party programs for cutting-edge support of symbolic computation. It also makes some incompatible changes to the Python syntax.
SAGE is built from a core of Python, IPython, and NumPy. In a posting to the SAGE developer list, the lead SAGE developer, William Stein, described how he wishes NumPy and SciPy would follow more consistent documentation standards. Shortly thereafter Travis Oliphant committed the documentation standard which should be used in NumPy and SciPy. By slowly working the docstring documentation into a consistent state, PyLab can form a more consistent and usable foundation for SAGE.