After all the upgrades and tweaks to the AC100 (screen upgrade to 1280×720, cooling improvements and boosting the clock speed by over 40%), only one significant issue remains: it only has 512MB of RAM. Unfortunately, the memory controller initialization is done by the closed-source boot loader, so even if we were to solder in bigger chips (Tegra2 can handle up to 1GB of RAM), it is unlikely in the extreme that it would just work.
So, other than increasing the physical amount of memory, can we actually do anything to improve the situation? Well, as a matter of fact, there are a few things.
Clawing Back Some Memory
By default, the GPU gets allocated a hefty 64MB of RAM out of 512MB that we have. This is quite a substantial fraction of our memory, and it would be nice to claw some of it back if we are not using it. I find the Nvidia’s Tegra binary accelerated driver to be too buggy to use under normal circumstances, so I use the basic unaccelerated frame buffer driver instead. There are two frame buffer allocations on the AC100: the internal display and the HDMI port. The latter is only intended for use with TVs which means we shouldn’t need a resulition of more than 1920×1080 on that port. The highest resolution display we can have on the internal port is 1280×720. That means that the maximum amount of memory used by those two frame buffers is 8100KB + 3600KB 11700KB. To be on the safe side, let’s call that 16MB. That still leaves us 48MB that we should be able to safely reclaim. We can do that by telling the kernel that there is extra memory at certain addresses using the following boot parameters:
Make sure the accelerated binary Tegra driver is disabled in your xorg.conf, reboot and you should now have 496MB of usable RAM instead of 448MB. It’s just over an extra 10%, which should make a noticeable difference given how tight the memory is to begin with.
If you aren’t using the HDMI interface, my tests show that it is in fact possible to reduce the GPU memory to just 2MB with no ill effects, when using the 1280×720 display panel, because the frame buffer seems to operate in 16-bit mode by default:
That leaves a total of 510MB of for applications.
In the recent kernels, there are two modules that are very useful when we have plenty of CPU resources but very little memory – just the case on the AC100. They are zcache and zram. On the 3.0 kernels instead of zram we can use frontcache which is similar but has the advantage that it is aware and cooperates with zcache. Since at the time of writing this 3.0 isn’t quite as polished and stable for the AC100 as 2.6.38, let’s focus on zram instead.
Assuming you have compiled zcache support into your kernel, all you need to do to enable it is add the kernel boot paramter “zcache”. From there on, your caches should be compressed, thus increasing the amount they can store.
zram provides a virtual block device backed by RAM, but the contents are compressed, so it should always end up using less than the amount of memory it presents as a block device (unless all of the data is uncompressible, which is very unlikely). To err on the side of caution we shouldn’t set this to more than half of the total memory across all the zram devices. To ensure optimal performance, we should also set the number of zram devices to be the same as the number of CPUs cores in the system to make sure that all CPUs end up being used (each zram device handler is a single thread).
To set the number of zram devices to 2 (Tegra2 has 2 CPU cores), we need to create the file /etc/modprobe.d/zram.conf containing the following line:
options zram num_devices=2
Then once we load the zram module (modprobe zram), we should see device nodes called /dev/zram*. We can configure the devices:
The amount of memory assigned to each zram device should be such that their total combined size doesn’t exceed half of the total physical memory in the system.
Then we can create swap headers on those zram devices using mkswap (e.g. mkswap /dev/zram0) and enable swapping to them (swapon -p100 /dev/zram0).
We should now have some compressed RAM for swapping to instead of swapping to a slow SD card.
It turns out that some of the default settings on Linux distributions aren’t as sensible as they could be. By default the amount of stack space each thread is allocated is 8MB. This is unnecessarily large and results in more memory consumption than is necessary. Instead we can set the soft limit to 256KB using “ulimit -s 256”. Ideally we should make this happen automatically at startup by creating a file /etc/security/limits.d/90-stack.conf containing the following:
* soft stack 256
Some users have reported that this can increase the amount of available memory after booting by a a rather substantial amount. Since this is a soft limit, programs that require more stack space can still allocate it by asking for it.
Choice of Software