Research project ideas within the Titanium Project
This page provides a listing of project ideas related to the ongoing work with the
Titanium project. Many are suitable for semester projects, and some have publication
potential. The tasks range from relatively simple, isolated projects that an undergraduate
with moderate programming skills could handle, to very complicated projects that could
involve sweeping changes to the Titanium compiler and runtime system, which should
probably only be attempted by veteran Titanium compiler developers.
Each task is labeled with a brief description and a rough approximation of the
difficulty level. Unless otherwise noted, inquiries regarding tasks should be directed to
the Titanium Developer mailing list.
Also, if you choose to pursue one of these projects, please send mail to that list to keep
us informed so we can prevent conflicts. This list will be updated as new projects are
thought up and others completed or discarded. If you have an idea for a project related
to languages, compilers, optimizers, HPC applications, benchmarking
or parallel computation, there's probably a way we can fit it into the Titanium framework.
Compiler/Optimizer Language Work
- Titanium-to-Java translator (medium) - we'd like an automated way to
translate Titanium programs into shared-memory, true-blue Java programs with threads. This
would give Titanium users an upgrade path when moving to platforms not currently supported
by Titanium, and provide a way to make some performance comparisons between comparable
apps written in both languages. Probably best implemented by leveraging the existing
Titanium parser and modifying it to dump legal Java source that replaces invocation of
Titanium-specific features with calls to Java methods in a new compatibility layer that
implements things like barrier, broadcast, etc. using multi-threaded Java code.
- Investigate source-to-source compilation issues (medium) -
Evaluate the effect of the Titanium's two-stage compilation implementation
strategy (namely Titanium-to-C and C-to-target code) on sequential
optimization. Are there examples of "standard" optimizations done by C
compilers that are done poorly on our compiler-generated C code? Examples to
look at are register management, constant folding, elimination of extra
temporaries, basic block optimizations. What changes should be made to
Titanium-to-C sequential optimization to better exploit optimization of C code?
This one has definite publication potential.
- Sequential optimizations (medium) - The Titanium optimizer is lacking
some important sequential optimizations that should really be added. Namely:
global copy propagation,
and others. A number of analyses are already present in the optimizer
that should facilitate developing these optimizations.
- Add optimizations to leverage shared/private inference (medium) -
Ben Liblit recently added an analysis and inference system to
Titanium that statically infers which data objects are always private (i.e. never accessed by a
remote processor). It should now be possible to augment the optimizer to leverage this new
information to perform more aggressive sequential optimizations (e.g. instruction reordering and
common sub-expression elimination) on private data.
- Reconcile the shared/nonshared type qualifiers with shared and private regions (hard) -
There's a bit of design work here, and also some good honest compiler hacking. The ultimate goal would be to
eliminate the runtime checks that the region code performs whenever a pointer field is updated. We currently need
those checks at runtime to ensure that objects in shared regions never contain pointers to private objects. If both
systems could be brought together, that invariant could be enforced by the type checker, and therefore most or all the
runtime checks could be eliminated. Contact Ben or
David for more details.
- Non-blocking remote memory read/writes (hard) - The Titanium
distributed runtime system is designed to permit weak (non-blocking) remote memory
operations. However, these operations are currently only utilized in a few hand-optimized
areas of the runtime system. It would be nice to have an analysis that automatically
infers which user-written remote memory operations would still be correct when implemented
with non-blocking reads and writes, and implement an optimization that translates the
appropriate blocking reads/writes to split-phase operations that are less
latency-sensitive. The theoretical basis for this analysis has already been established
(Arvind Krishnamurthy's thesis), and there's even some code written by Arvind, but we
don't have a working implementation in the language yet.
- Improve alias analysis (hard) - Theoretical work and a simple
implementation of pointer alias analysis has been done by
Glen Jeh, but it would be nice to implement a more
- Automatic loop tiling and matrix optimizations (hard) -
We'd like to add some optimizations that detect certain matrix computations (either
through pattern-matching or with explicit annotations) and restructure the computation for
optimal performance on the given architecture. This may include some feedback directed
optimization, similar to the FFT-W, Atlas, and Phipack projects.
Geoff Pike has done some work on generic
loop tiling optimizations, and this work could be extended for specific kernels.
- Fix immutables to be mutable - implement value classes (very hard) - The titanium group has
drafted a proposal for augmenting the immutable language construct to have mutability
properties to support user's needs, without negating the performance advantage of inlined
(stack-allocated) objects. Dan Bonachea and
Paul Hilfinger have the details.
- Extend loop optimizations to Java arrays (very very hard - probably a
lost cause) - Titanium includes array indexing optimizations for Titanium arrays. It would
be nice to extend these optimizations to also work with Java arrays, but the
non-rectangular shape and aribitrary internal aliasing of Java multi-dimensional arrays
makes this nearly impossible. It may still be worthwhile on single-dimensional Java
Language Runtime and Communication System Work
- Implement an application event logger and/or tracing GUI (easy) - We'd like a facility
to log application events, including major events in the runtime system (such as barriers,
broadcasts, and other collective operations) as well as application-defined events
(such as the start/end of various phases of computation) to assist in performance debugging.
Ideally, this would become part of Titanium's standard library and provide some sort of
graphical output to assist programmers.
GASNet contains extensive tracing facilities for gathering this information, and even a prototype
text-based tool for displaying the information that could help with getting this started.
An alternate approach would be to look at transforming the trace information as required to hook-up a pre-existing
parallel performance profiling tool, such as Vampir or Tau.
- Implement Region safety checking (medium) -
Our language reference currently specs out behavior for ti.lang.Region to provide safe
region-based memory management services, by detecting errors (namely deleting a Region
while live references remain) and throwing an exception to prevent the unsafe memory deallocation
(which enables recovery and debugging). Currently the necessary checks are not fully implemented,
leaving a memory safety hole in the implementation. This project involves implementing region
reference counts (following established theoretical algorithms)
in the region library of the
Titanium runtime system, using existing information the compiler already outputs
concerning region access behavior.
- Implement a debugger for Titanium (medium) -
At one point Titanium had a debugger based on gdb, but the code has degraded considerably
due to advances in gdb and the Titanium implementation and language. The translator
still outputs debugging line and symbol information, but we need a
debugger which can read this information and provide source-level debugging for
Titanium applications. Ideally, this debugger would support parallel debugging (threads
or perhaps even distributed-memory), but even a sequential debugger would be a significant
- Investigate vectorization issues in Titanium (medium) -
Titanium runs on vector supercomputer hardware such as the Cray X1, but insufficient work has been
done to ensure good automatic vectorization of the generated C code. This project would involve benchmarking
the performance of existing Titanium applications and test programs on vector machines,
assessing the current impediments to vectorization,
and find ways to improve vectorization of the generated code - starting with hand-modification
of the generated code and runtime, and eventually finding ways to automate these improvements in the compiler.
- Tune garbage collector (medium) - Titanium uses the
Boehm-Weiser garbage collector.
Recent versions of the collector include a number of advanced features (eg incremental collection,
explicit type descriptors, parallel mark and sweep, object finalization, etc)
that we're not currently taking advantage of.
- Add GASNet support for new high-performance networks (medium) -
Titanium has GASNet-based
backends for several networks. We'd like to add GASNet support for new high-performance
networks/machines, such as IBM Blue Gene, NEC SX, Sun Wildfire and/or other high-performance system-area networks.
Contact Dan for more details.
- Fix static (medium) - The current semantics of "static" in
Titanium correspond to "per-processor" static. The Titanium group has decided to
change the semantics of static to mean "per-job".
- Titanium arrays over general domains (hard) - Extend Titanium arrays to
allow them to have a general (non-rectangular) domain. This would mostly involve library development work.
- Experiment with out-of-core support on Titanium arrays (hard) - some research has been done
at Rice University about incorporating language support for out-of-core processing into
multi-dimensional arrays in KeLP,
a C++-based parallel library with some similar goals to
Titanium. It may be interesting to modify the Titanium array library to support automatic
out-of-core processing, whereby array data (or portions of array data) is automatically
moved to and from disk (with possibly some caching) to support the development of concise,
out-of-core algorithms in Titanium.
Java/Titanium Library Work
- STL-like library using Ti templates (easy) - Titanium has extensive
support for templates. It would be nice to use this feature to add some Titanium libraries
analogous to the C++ standard template library that provide templatized support for
various common data structures and algorithms (e.g. sort arbitrary type, hash table
indexed by a provided (possibly non-reference) type, bitsets, type-parameterized sets and
multisets, type-parameterized stack, type-parameterized queue, distributed array library,
distributed task queue, etc.) Some other libraries would also be useful, for example
a getopts-style command-line options parser, a Complex number immutable library, etc.
- Implement missing native methods in Java 1.4 library support (medium) -
Thanks to recent upgrades, the Titanium language is now
a near-superset of Java 1.4, and contains most of the associated Java-level libraries.
However, many of the native methods for the less commonly-used Java 1.4 standard library
classes remain unimplemented.
This involves selecting useful Java 1.4 classes that remain unimplemented and writing
the supporting native code methods.
- Deep exchange and possibly other deep remote copy operations (medium) -
The Titanium exchange operation currently exchanges a "shallow" copy of the
supplied array elements. In some cases, a deep copy is desirable
- Add more collective communication libraries (medium) -
The GASNet communication system provides a number of collective communication operations
that are currently not exposed at the language level in Titanium. We should design
appropriate interfaces for these operations and add them to the language/library.
- Implement a parallel version of the Berger-Rigoutsos algorithm (medium) -
The Berger-Rigoutsos algorithm is an edge-detection algorithm that attempts to cover points
in a space with rectangular regions, which can then be used to subdivide a problem and structure
further parallel computation. This algorithm is useful in various grid-based applications, and
currently the only known implementation is purely sequential. It would be nice to have a parallel
implementation, written in Titanium or C, available as a library for our applications users.
Contact Peter McCorquodale for details. Reference: Berger
& Rigoutsos, ``An Algorithm for Point Clustering and Grid Generation'', IEEE Trans. Syst., Man,
Cybern., vol. 21, pp. 1278--1286 (1991).
- Add a Titanium interface to ChomboVis (medium) -
is a program for visualization of 2D and 3D AMR data sets. It would be nice
to build a bridge from Titanium to ComboVis so application developers can
easily visualize the data sets created by their Titanium programs. This may
involve adding some minimal Titanium support for outputting application data
to the file system in the HDF5 file format.
Talk to Phil Colella for more information.
- Update the Titanium sample code repository (easy) - The Titanium webpage could use
some updates and tweaking to reflect ongoing progress - we especially
need a page of Titanium applications and benchmarks (like the private app page, but more
organized and tested to make sure it all compiles). There's also a number
of applications missing from the private page.
- Run some Java test suites on Titanium (easy) -
Titanium implements a dialect of Java 1.4, and there are a number of Java conformance
test suites available (eg the Jacks test suite from Jikes).
We should run some of these suites,
filter through the results to find failures in functionality that Titanium claims
to support, and report those bugs for fixing.
- Ti-doc implementation (medium) - we'd like a version of javadoc that
works on titanium source code. We have the source for some old versions of javadoc that
could be modified to understand Titanium and dump the relevant info
- Implement client-server remote compilation for Titanium (medium) -
Titanium supports a wide range of supercomputing systems and architectures. Generally the
most difficult issue when porting to a new system is porting the translator component,
which translates Titanium into platform-independent ISO C. It's also difficult and
labor-intensive to maintain up-to-date installations of the translator across the large
number of systems we frequently use. We'd like to add a client-server netcompilation
option to Titanium, whereby compilation on remote systems can automatically connect
to a centralized translation service (probably using HTTP and/or ssh) and submit
their source files for translation, recieving the generated code result for backend
compilation. This would allow us to run on several supercomputing systems we currently
cannot support because it is impractical or impossible to run the translator on the
target machine. A similar netcompile option has been implemented in
Berkeley UPC, which we could use as a model.
Back to main page
Tuesday April 29, 2014
Comments to Titanium developers