Summary of user-visible changes in Titanium, since v 2.205 Language changes ================ * A new version of the Titanium reference manual is now available, documenting the following features and the semantic changes listed under library changes. See http://titanium.cs.berkeley.edu * Titanium now fully supports the Java 1.4 language features (although not yet the library). Specifically, nested/inner/anonymous classes are now fully supported, along with other misc features (eg anonymous arrays, instance initializers, .class literals, assert, etc.). * Titanium now fully allows mixing of templates with inheritance. See the language reference for details. * Template instantiations no longer require a leading 'template' or '@' keyword * Add the C macro preprocessor to the Titanium compilation process. See the language reference for details. * Domain is now a java.lang.Object - Casts and assignments back and forth are now permitted. Titanium still provides implicit coercion from RectDomain to Domain, but there is no implicit coercion from RectDomain directly to Object (ie without an intervening cast to Domain). * Numerous semantic clarifications to the Domain and RectDomain libraries. * Added a mechanism to allow binary operator overloading with a primitive as the LHS operand - see the language reference for details. * Overloading of == and != on reference types is now deprecated and will soon be prohibited. Implement .equals() instead. * Operator overload methods are now forbidden from being static * Result of an overloaded op-assign operator is now implicitly coerced to that of LHS operand, as if the user had inserted an explicit cast (in other words, the coercion could result in truncation for primitives, or runtime ClassCastExceptions for reference types - the invocation remains a typecheck error if the result type is not castable to the static type of the LHS) * Make grid equality comparisons (TiArrayA == TiArrayB) legal everywhere in the language, with the semantics that it returns true iff both operands are null, or both are non-null and TiArrayA.domain() == TiArrayB.domain() and Forall_p in TiArrayA.domain(), TiArrayA[p] and TiArrayB[p] are the same variable (and similarly for TiArrayA != TiArrayB). This applies to explicit comparisons in user code and implicit ones in the default immutable comparison methods. * Default immutable == and != operations have been clarified to compare their instance fields one at a time (in order of declaration) using == or != respectively (which may end up calling a user's op== or op!= for any immutable-typed fields, and the left/right operand ordering will match the original call), and using short-circuit evaluation (ie == stops at the first field that compares unequal, and similarly for !=). * Any user-provided methods on immutable type T for the op==()/op!=() methods having one argument of type T must now have *exactly* this signature: public boolean single op==(T single I); public boolean single op!=(T single I); * Remove Point, Domain and RectDomain as language keywords * foreach loops now allow an optional Point declaration for the iteration variable, as in: foreach (Point<1> p in [1:10]) { ... } Performance improvements ======================== * Many performance improvements to Titanium Arrays (most noticeable for small array copies, transpositional copies, code manipulating/creating Titanium Arrays, and loop overheads for short loops over Titanium arrays). * Significant improvements to the performance of static accesses and Ti.thisProc() on pthreaded backends (all the *smp backends). * Construction of unit-stride RectDomains and RectDomains with manifest constant bounds is significantly faster now * Many performance improvements to Point construction and manipulation. Notably, the common case of Point.all({0,1}) is now expanded inline to a simple value copy, and Point<1> are now represented directly as ints, so Point<1>.op[] is faster. * Improvements to lowering to remove dead generated code, which may provide a performance improvement * Tuned the sequential performance of monitors for the case of uncontended locks, removing two dynamic thread lookups and a dynamic allocation from the uncontended lock/unlock path. * Improve method dispatch codegen to use static dispatch wherever possible. * Add a new "extrafold" constant folding optimization pass, which performs additional constant folding that is not required by Java semantics (but useful for optimization). This notably including folding of static final fields which are accessed via "instance.fieldname" (rather than "typename.fieldname"), and folding of non-static final fields with compile-time constant initializers. * Replace the optimizer's dominator analysis with a faster and more scalable implementation, reducing compilation time on large programs. * foreach loops over Domains<> are much faster now * Improved initialization of immutable-typed fields to reduce temps and copies * Improved inlining of immutable constructors to reduce temps and copies * Numerous performance improvements to GASNet communication backends. Domain Library changes ====================== * The Domain library has been completely reimplemented using smarter algorithms and a smarter representation, leading to vastly improved performance for most operations (at worst a few ops have negligble change) * The RectDomain library has been carefully tuned and many operations are significantly faster now. * Titanium now supports Points, Domains, RectDomains and Titanium arrays of more than 3 dimensions. Use "configure --with-max-tiarity=N" to configure for a max arity of N. * Domain "safety" checks are now disabled by default, unless you compile with -g. This includes checking for user errors like setting a negative stride * Domain,RectDomain method .isNull() has been renamed to .isEmpty() * "Domain == Domain" and "Domain != Domain" are now deprecated and will soon be removed. Use the new Domain.equals() method instead. * The mangling scheme for Point methods has changed to match the general method mangling scheme. Any native code accessing Point's will likely need to be updated. * Domain.toString() on an empty Domain used to return an empty string. Now it returns the string representation of an empty RectDomain. * Added a Domain copy constructor, which is the recommended way to "localize" a remote Domain. * Add many op-assign methods to Point, RectDomain and Domain * Add new Point methods Point.replace, Point.upperBound and Point.lowerBound * Fix a number of correctness problems with Domain, specifically with unions, differences, boundingBox and isRectangular producing incorrect answers for some non-unit-stride domains Misc Library changes ==================== * Added explicitly non-blocking array copy operations for Java and Titanium arrays. See language reference for details. * ti.lang.Timer now uses the fastest and most accurate timers available on the given CPU architecture. * Upgrade ti.lang.PAPICounter to support PAPI3, and include many fixes for performance and robustness * Added more intuitive PAPI counter names. The old names are still available, so existing codes using PAPI should not break. A .toString() method is added for PAPI counters. * Performance improvements to java.util.Random - the frequently-called nextDouble() method is about 4x faster now. * Add the ti.lang.Complex library, a convenience library for manipulating complex numbers. * Added a TiArray.isContiguous() method for querying virtual memory contiguity of array data. * Ti.lang.CompileSettings now provides public static final variables set to current compile settings * Add ti.lang.Ti methods providing programmatic control over when GC's can happen - see the language reference for details. Misc changes ============ * New supported platforms: Infiniband/vapi, Quadrics/elan4, Mac OSX, Cray X1, Intel C, IBM xlC C++, udp/64, PathScale C * Many, many bug fixes - see ChangeLog or the Titanium bug database for complete details. * Upgrade the Boehm-Weiser GC to 6.5, which includes many, many stability fixes, especially for AIX, OSX, IRIX, and HPUX. * Many improvements to the runtime error reporting for bounds check errors, assertions and other safety violations. * Add communication profiling support for gasnet-* backends, via the gasnet_trace tool. See http://titanium.cs.berkeley.edu/doc/gasnet-trace.txt * Titanium now fully implements the Java local variable shadowing rules, which includes prohibiting shadowed declarations introduced by a foreach loop header. * Add hooks to raise the CPU and memory rlimits to the maximum in the compiler and applications * New tcdemangle program provides demangling of Titanium types and methods, see "tcdemangle --help" * Improved error messages for templates * Improve the creation of the StringLiteral table (better performance, less memory waste, more Java-like semantics), and fix some init-time race conditions on *smp backends. * Improvements to compiler diagnostic output, especially for code using templates or nested classes, and tc debugging dumps (for debugging the compiler) * Add a new environment variable TI_POLITE_SYNC that forces polite-mode synchronization, regardless of apparent thread/CPU counts * Add a configure option --enable-debug which forces global debugging mode for the compiler, runtime, GASNet and all built applications. * Ensure all threads get a reasonable default stack size of 2MB on *smp backends (also tweakable via new environment variable TI_STACK_SIZE). * Add new environment vars for finding region memory allocation bugs, see "tcbuild --help-envvars" tcbuild changes =============== * Improvements to tcbuild compilation strategy that reduce disk churn and temp space requirements during compilation, especially for building the tlibs. * tcbuild on SMP's now uses parallel make by default for C compilation. * Allow permutation of the tcbuild arguments - source files and arguments can now be freely intermixed on the command line * Embed additional information about Titanium applications in compiled executables as ident strings (run "ident yourprog" to see info about a Titanium executable) * Add a warning-suppression mechanism to tc. For example, this command line: tcbuild --tc-flags "-woff deprecated-isnull" would suppress the new warning about the deprecated method RD.isNull(). * Add a warning if users invoke tcbuild -O without --nobcheck * Upgrade tcbuild argument handling: --------------------------------- Several of the tcbuild options which are used to pass flags to other tools (namely --cc-flags --ld-flags --ld-libs --make-flags --classlib-{pre,post} and --tc-flags) are documented as "adding" the given flags to the invocation for the relevant tool - however the previously implemented behavior was actually "replace" - ie, if --cc-flags was passed twice on the tcbuild command line (directly or via TCBUILD_FLAGS), the second one would take precedence and erase the first one (and any default values). The new behavior is additive - passing multiple instances of --cc-flags --ld-flags --ld-libs --make-flags --classlib-{pre,post} or --tc-flags on the tcbuild command line adds to the current option value (without replacing the earlier options). This allows for better use of TCBUILD_FLAGS (eg if you want to set some default --tc-flags options for all compilations) and prevents the need for lots of tricky quoting on the tcbuild command line to pass multiple options to the various tools. It's also closer to the behavior of gcc's analogous options like -Wl, -Wp, etc. Note that --cc-opt-flags, --cc-debug-flags and --cc-system-flags keep their "replace" behavior (since these are used to control what we give the C compiler for -O vs -g, and are generally set by the configure script and not the user) - the help screen has been updated to clarify that. --- Updated backend information --- Here are descriptions of the Titanium backends currently available: * sequential - A single Titanium process - useful for testing and debugging. * smp - This parallel backend runs the Titanium processes as Posix threads within a single shared memory space, with fast "narrow" pointers. You should specify the number of parallel threads by setting the environment variable TI_THREADS = "N" (for N threads). * mpi-cluster-uniprocess - portable, high-performance MPI-based cluster backend that should run on any cluster with MPI 1.1 or better * mpi-cluster-smp - Same as above, but for a cluster of SMP's (CLUMP). Each node in the network runs one or more Titanium processes as Posix threads. For more detailed information about using the mpi-* backends, see: http://titanium.cs.berkeley.edu/doc/mpi-backend-usage.txt * udp-cluster-uniprocess - the portable UDP-based cluster backend that should run on basically any set of machines that have sshd and a basic TCP/IP stack (they can also take advantage of gexec on Millennium). * udp-cluster-smp - Same as above, but for a cluster of SMP's (CLUMP). Each node in the network runs one or more Titanium processes as Posix threads. For more detailed information about using the udp-* backends, see: http://titanium.cs.berkeley.edu/doc/udp-backend-usage.txt * gasnet-gm-uni, gasnet-gm-smp - the backend for high-performance communication on clusters of uniprocessors and clusters of SMP's connected via Myrinet/GM * gasnet-elan-uni, gasnet-elan-smp - the backend for high-performance communication on clusters of uniprocessors and clusters of SMP's connected via Quadrics/Elan 3 or 4. * gasnet-vapi-uni, gasnet-vapi-smp - the backend for high-performance communication on clusters of uniprocessors and clusters of SMP's connected via Infiniband with Mellanox VAPI software. * gasnet-lapi-uni, gasnet-lapi-smp - the backend for high-performance communication on the IBM SP using the LAPI interface - subsumes the old sp3 backend (which still works). * sp3 - Runs on the IBM SP Power machine with LAPI as the low-level communication system (uses an AM-to-LAPI compatibility layer). * gasnet-mpi-uni, gasnet-mpi-smp - for testing purposes only * cray-t3e - Runs on the Cray T3E using shmem. This platform has a few limitations (notably, the lack of a garbage collector). * mill-cluster-uniprocess, mill-cluster-smp, sp2, sp2clump, tera-thread, now-cluster-uniprocess - deprecated backends