![]() |
![]() |
![]() |
|
![]() | |
![]() |
![]() |
![]() |
![]() |
![]() |
Log In | Not a Member? |
Support
![]() |
![]() |
Accelerate.framework errataBelow is a list of known issues with Accelerate.framework / vecLib that may cause your application to operate incorrectly: vsub vsubIn MacOS X.2.7 (G5 only) and MacOS X.3.0 (G4 and G5 only), but not MacOS X.2.8 or earlier or later revisions of MacOS X, workaround 1Determine the version of MacOS X during the runtime and do the correct thing.
You can also use sysctl to determine the OS revision. This might be lighter weight for mach-o applications. workaround 2Another way is to use
This method is likely to be slower. LAPACK thread safetyMacOS X applications that intend to call the LAPACK linear algebra APIs from multiple threads must take the following precautions to ensure correct results. LAPACK is part of the Accelerate and vecLib frameworks. Prototypes for its APIs can be found in:
In MacOS X Release 10.2, LAPACK is not thread-safe. Applications that intend to call the LAPACK APIs from multiple threads must implement their own locking discipline to prevent simultaneous execution of LAPACK routines. In MacOS X Release 10.3, LAPACK thread-safety is greatly enhanced. Applications that intend to call the LAPACK APIs from multiple threads must ensure that the following two initialization calls are completed before commencing simultaneous execution of LAPACK routines. In C:
In FORTRAN:
Fortran calling vecLib's CDOTC, CDOTU, ZDOTC, and ZDOTU.The FORTRAN entry points in Mac OS X's vecLib adhere to the call/return conventions of g77. In particular, with g77, the return value of a COMPLEX or DOUBLE COMPLEX function is stored to memory through a pointer. The caller must take care to pass that pointer in PPC general purpose register R3 according to the g77 ABI. With xlf (and the emerging g95), COMPLEX and DOUBLE COMPLEX function return values are left in the PowerPC floating point register file. Modern implementations of the C language use the same approach and no doubt gave impetus to this characteristic of modern FORTRAN. Just four Level 1 BLAS functions are at issue: CDOTC, CDOTU, ZDOTC, and ZDOTU. Each returns a COMPLEX (or DOUBLE COMPLEX) value. When xlf compiles a function invocation into a call to one of these routines, it expects to find the *return* value in the floating point register file. When g77 compiles a function invocation into a call to one of these routines, it expects to find the return value in a pre-allocated *memory* location. The vecLib implementation of these four functions is compatible with the g77 scheme, but not the xlf scheme. xlf codes may incorporate the following "wrappers" that re-implement CDOTC, CDOTU, ZDOTC, and ZDOTU in terms of a utility *subroutine* already present in vecLib. There is no ABI conflict in the call/return scheme for these vecLib subroutines with xlf. It is crucial though, that the same compiler, e.g. xlf, compile the caller to these replacements as well as the replacements themselves so that the *function* return ABI matches. The utility subroutines (cblas_*_sub) are fully optimized for PowerPC.
vImage Scale OperationsOn MacOS X.3.{0,1,2}, the vImage Scale function may fail to properly translate the image vertically while it is scaling it. This can result in a resized image that is also translated. The last pixel row will be expanded to occupy a part of the image. It is recommended that you use the Affine Warp function instead, which does not have this problem. It may be slightly faster to use the low level shearing functions to do scaling, since that would be a two pass algorithm instead of a three pass algorithm. On MacOS X.3 (any), the vImage Scale function does not correctly set the kvImageEdgeExtend flag internally. To avoid edging artifacts, pass this flag with vImageScale* on MacOS X.3. This problem is fixed on MacOS X.4. vImage Shear OperationsThe 1D shear operations do not support the case where the destination buffer size in the orthogonal dimension to the shear dimension (plus the This limitation does not extend to size disparities in the shear dimension. In our horizontal shear example, if the width of the destination buffer is larger than the source buffer, the function handles the case gracefully, filling the residual space that does not map to any location in the source buffer with either the background color or the nearest edge pixel if We do support oversized destination buffers in the orthogonal dimension through the AffineWarp functionality. The 1D shears are intended to be low level bottleneck functions, and have a few limitations that the higher level functions do not have. AltiVec PEM misalignment algorithm errataSection 3.1.6.1 of the AltiVec PEM details algorithms for dealing with loading and storing misaligned vectors. These algorithms are broken for the case where the data is actually 16 byte aligned, and may in rare circumstances lead to a segmentation fault. At issue here is the second aligned vector load or store at address + 16 bytes. If the address is 16 byte aligned, this second load or store may contain no valid bytes. If it happens to fall on a new page, that page may be unmapped, in which case your application will receive a segmentation fault. On MacOS 9, the memory space is a large contiguous region, so such segmentation faults rarely or never occurred. MacOS X is more at risk since the address space is fragmented into mapped and unmapped areas. Changes in malloc/valloc may expose your application to new arrangements of mapped and unmapped areas. This may cause applications that "worked" under previous OS releases to fail to work on later ones. This is not a bug in the operating system. This is a bug in the PEM misalignment algorithm which will cause aligned accesses to land in unknown memory spaces. The simplest solution to solve this problem is to do the second load or store at address + 15 bytes instead of address + 16. For single vector loads, the algorithm is as follows:
The situation for stores is similar though some attention must be paid to store order: Note: Blind use of the code provided above may lead to disappointing performance. Clearly, quite a bit of this work can be recycled between adjacent misaligned vectors. Please see the section on fast misalignment handling for how to efficiently and correctly load misaligned vectors. Note: Though we take great pains not to (we have an entire cluster set up to detect this error condition), we have in the past occasionally fallen victim to this problem ourselves. If you experience crashes, the workaround is to allocate your image buffer and/or temp buffer to be 1 byte larger than it needs to be. If the problem is in the Accelerate.framework, please also file a bug report with http://bugreporter.apple.com against the Accelerate/X component. |
. |