Mac OS X Reference Library Apple Developer
Search

Should You Recompile Your Software as a 64-Bit Executable?

Although 64-bit executables make it easier for you to manage large data sets (compared to memory mapping of large files in a 32-bit application), the use of 64-bit executables may raise other issues. Therefore you should transition your software to a 64-bit executable format only when the 64-bit environment offers a compelling advantage for your specific purposes.

This chapter explores some of the reasons you might or might not want to transition your software to a 64-bit executable format. Before you read this entire guide, read this chapter to decide whether your software will benefit from having a 64-bit executable format. When you have finished, if you are convinced that your software will benefit from a 64-bit executable format, you should read the remaining chapters in this document.

If some of the capabilities of a 64-bit environment would be helpful to you but you do not want to transition your software to a 64-bit executable, read the section “Alternatives to 64-Bit Computing” to learn techniques that offer many of the same benefits but let you remain in a 32-bit environment.

Common Misconceptions

Before going further, it is important to dispel a few common misconceptions.

Factors to Consider

A 64-bit executable can provide many benefits to users and to programmers, depending on the nature of your program. As a general rule, although a 32-bit application can provide the same functionality as a 64-bit application, a 64-bit application requires less effort to support large data sets.

Some applications can benefit significantly from 64-bit computing on both PowerPC and Intel. These include data mining, web caches and search engines, CAD/CAE/CAM software, large-scale 3D rendering (such as a movie studio might use, not a computer game), scientific computing, large database systems (for custom caching), and specialized image and data processing systems.

On Intel-based Macintosh computers, most applications will be somewhat faster when recompiled as a 64-bit executable. Whether this benefit justifies needed porting effort depends largely on how important performance is to your particular application and whether your application would benefit from a larger address space.

Note: With very few exceptions, you should not compile your applications as 64-bit-only executables. Instead, you should compile them as three-way or four-way universal binaries containing both 32-bit Intel and PowerPC executable code as well as executable code for one or both 64-bit architectures.

There are a number of factors to consider when deciding whether to make your application run in 64-bit mode. These considerations are described in the sections that follow:

Operating System Version

Prior to Mac OS X v10.6, all applications that shipped with the operating system were 32-bit applications. Beginning in v10.6, applications that ship with the operating system are generally 64-bit applications.

This means that in v10.5 and previous, the first third-party 64-bit application that a user runs causes the entire 64-bit framework stack to be brought into memory, resulting in a launch performance penalty and significant memory overhead.

Similarly, in v10.6 and later, the first non-64-bit-capable application pays a performance and memory footprint penalty because Mac OS X must bring in the entire 32-bit framework stack.

Kernel Extensions

If you are writing a kernel extension, you must make it 64-bit-capable. Beginning in Snow Leopard, some hardware configurations use a 64-bit kernel by default. The 64-bit kernel cannot load 32-bit kernel extensions.

Performance-Critical Applications

If your application is performance critical, you might want to recompile your application as a 64-bit executable, particularly on Intel-based Macintosh computers.

Here’s why. The 64-bit Intel architecture contains additional CPU registers that are not available when compiling a 32-bit Intel executable. For example, the 64-bit architecture has 16 general-purpose integer registers instead of 8. Because of the extra register space, the first few arguments are passed in registers instead of on the stack. Thus, by compiling some applications as 64-bit, you may improve performance because the code generates fewer memory accesses on function calls. As a general rule, 64-bit Intel executables run somewhat more quickly unless the increased code and data size interact badly (performance-wise) with the CPU cache.

By contrast, executables compiled for the 64-bit PowerPC architecture can access the same number of registers (32) as 32-bit PowerPC executables. As a general rule, 64-bit PowerPC executables will execute slightly more slowly unless they make significant use of 64-bit math. Thus, if your application does not require a 64-bit address space, you may want to ship your application as a 32-bit executable on PowerPC by default.

As with any complicated software system, it is difficult to predict the relative performance of recompiling a piece of software as a 64-bit executable. The only way to know for certain (on either architecture) is to compile for 64-bit and benchmark both versions of the application.

Here are some of the potential performance pitfalls:

For the most part, these potential performance impacts should be small, but if your application is performance critical, you should be aware of them.

“Huge” Data Objects

If your application may need random access to exceptionally large (>2GB) data sets, it is easier to support these data sets in a 64-bit environment. You can support large data sets in a 32-bit application using memory mapping, but doing so requires additional code. Thus, for new applications, you should carefully evaluate whether supporting such large data sets is required in the 32-bit version of your application.

Note: It is not generally necessary to use 64-bit programming when working with files larger than 2 GB in a streaming fashion, such as when writing an audio or video application. These sorts of applications work with only a small section of a file at any given time and thus do not generally benefit significantly from the large address space of 64-bit computing. That said, these applications often do benefit from the additional registers afforded by 64-bit computing on the Intel architecture.

64-Bit Math Performance

Applications that use 64-bit integer math extensively may see performance gains on both PowerPC- and Intel-based Macintosh computers. In 32-bit applications, 64-bit integer math is performed by breaking the 64-bit integer into a pair of 32-bit quantities. It is possible to perform 64-bit computation in leaf functions in 32-bit applications, but this functionality generally offers only limited performance improvement.

Note: You do not need to transition your application to a 64-bit executable format merely because your application performs 64-bit math. You can perform 64-bit math transparently in a 32-bit application, albeit with slightly diminished performance.

Plug-in Compatibility

If you are writing an application, any plug-ins used by your application must be compiled for the same processor architecture and address width as the running application. For this reason, if your application depends heavily upon plug-ins (audio applications, for example), you may want to ship it as 32-bit for now.

Alternatively, you might add a user-selectable install option for the 64-bit version and then glue the two binaries together using the lipo command in a postinstall script. Doing so will encourage plug-in developers to update their code for 64-bit execution and at the same time will minimize user complaints.

If you are writing a plug-in, you should begin transitioning your plug-in to 64-bit so that when 64-bit versions of the supporting application become available, your plug-in will not get left behind.

Beginning in Snow Leopard, Apple-developed applications (including key components of the OS) are transitioning to 64-bit executables. This means that users with 64-bit-capable computers will be running the 64-bit slice of these key system components. Any plug-ins (screen savers, printer dialog extensions, and so on) that need to load in these applications must be recompiled as 64-bit plug-ins.

As a special exception, the System Preferences application provides a 32-bit fallback mode. If the user selects a system preferences pane without a 64-bit slice, it relaunches itself as a 32-bit executable (after displaying a dialog box). To maximize your users’ experience, however, you should still transition these preference panes to 64-bit plug-ins at your earliest convenience.

Memory Requirements

The memory usage of a 64-bit application may be significantly larger than for a 32-bit version of the same application. The difference in usage varies from application to application depending on what percentage of data structures contain data members that are larger in a 64-bit process. For this reason, on a computer with a small amount of memory, you may not want to run the 64-bit version of your application even if the computer can support it.

This concern is described in more detail in “Performance Optimization,” along with some tips for improving your memory usage in a 64-bit environment.

Alternatives to 64-Bit Computing

If you need your application to do 64-bit integer math, you can do so already in Mac OS X by using long long data types.

On PowerPC, if you compile your application using the -mcpu=G5 flag (to use G5-specific optimizations) and the -mpowerpc64 flag (to allow 64-bit math instructions), your 32-bit application can achieve 64-bit math performance comparable to that of a 64-bit application. This technique has some performance disadvantages, however, because nonleaf functions still work with 64-bit integer values in a pair of 32-bit registers due to the design of the 32-bit function call ABI.

Applications compiled with the -mcpu=G5 and -mpowerpc64 flags will not execute on non-G5 hardware. If you need to support G3 or G4 hardware, you can still do 64-bit math without these options with only a small performance penalty.

If your application accesses large files in a streaming fashion, such as an audio or video application, you can use existing Mac OS X file interfaces. Nearly all the file interfaces in Mac OS X are capable of handling 64-bit offsets even in 32-bit applications. However, Mac OS APIs that existed prior to HFS+ (such as QuickTime) may require you to use different functions for large file access. See the latest documentation for the APIs you are using for more specific information.

If you have a performance-critical application that would benefit from more than 4 GB of memory, you should read the section “Using mmap to Simulate a Large Address Space.”

Using mmap to Simulate a Large Address Space

As an alternative to using a large address space, you can simulate one in your application by creating your own pseudo-virtual-memory engine using the mmap system call. Instead of referring to data using pointers, use a data structure that contains a reference to a file and an offset into that file.

At first glance, this technique may seem incredibly inefficient, because you would expect the operating system to constantly move data into and out of memory. In practice, however, the Mac OS X VM system caches open files heavily. Thus, even though your application has only 4 GB of address space for use at any given time, your application can actually use far more than 4 GB of physical memory concurrently in the form of disk caches.

For this reason, if you do not close the file descriptor after you call mmap on the file, and if your computer’s RAM is large enough to hold your application’s entire data set, most of the memory mapping and unmapping operations should require little or no I/O. If the physical RAM is not large enough, your data ends up being paged to disk anyway; thus your performance is only marginally affected. Upon closing the file descriptor, these pages are released (after flushing dirty pages to disk).

Note: For optimal performance, you should generally limit the amount of data that you map at any given time to some reasonable percentage of total physical memory. To obtain the size of physical memory, you can use the following code:

#include <inttypes.h>
#include <stdio.h>
 
main()
{
    uint64_t mem_size;
    size_t len = sizeof(mem_size);
    int fail;
    if (sysctlbyname("hw.memsize", &mem_size, &len, NULL, 0) != 0) {
        perror("sctest");
    } else {
        printf("RAM size in bytes is %" PRIu64 ".\n", mem_size);
    }
}

When you need to access a piece of data, your in-application virtual memory code checks to see whether that information has already been mapped into memory. If not, it should map the data using mmap. If the mmap operation fails, your application has probably run out of usable virtual address space and must therefore choose a “victim” memory region and unmap it.

For optimal performance, a user-space VM system must use proper mapping granularity for the data. If the data divides neatly into fixed-size objects, these provide good units for mapping. Because the length of the mapped region always rounds up to the nearest page size boundary, you will usually find that performance improves if you map in groups of objects.

Note: You can find out the page size of the computer hardware you are using with the following code:

#include <inttypes.h>
#include <stdio.h>
 
main()
{
    uint64_t page_size;
    size_t len = sizeof(page_size);
    int fail;
    if (sysctlbyname("hw.pagesize", &page_size, &len, NULL, 0) != 0) {
        perror("sctest");
    } else {
        printf("RAM size in bytes is %" PRIu64 ".\n", page_size);
    }
}

If your data doesn’t have convenient fixed-size objects, you may choose an arbitrary page size (no less than the underlying physical page size) and divide the data into pages of that size. (A power-of-2 boundary is particularly convenient because you can then calculate the page number and the offset into the page by using bit masks and shift operations.)

No matter how you map the data, unless you do a lot of access pattern profiling, you may find it difficult to guess a good mapping granularity for most applications. For this reason, you should design your code with proper abstraction so that you can more easily adjust the mapping granularity in the future.

The code sample in “Simulating a 64-Bit Address Space with mmap and munmap ” demonstrates the use of mmap to map and unmap pieces of a large file.




Last updated: 2010-01-15

Did this document help you? Yes It's good, but... Not helpful...