Skip to content

ARM Boards and Me

After a long time waiting and hoping, I finally received my Parallella board! …to be reminded I didn’t get the accessory pack and that it runs as hot as a stove (damnable Zynq chip…). I was able to boot it up on USB power, and keep it cool with a box fan I had nearby, but of course this is not workable. I ordered power supplies and small 5V fans from Jameco, but they will not be here until next week >:/ In the meantime, I thought I should write down what I’ve been doing so far in my quest to use ARM boards for my personal and professional needs.

For ARM boards in general, ideally I would like to run Octopus and NWChem (two of my favorite computational packages), and also to have them use an ARM optimized BLAS/LAPACK implementation. With regard to graphics, the mobile GPUs in ARM SoCs all (except for Nvidia’s newer K1 I think) implement OpenGL ES as opposed to full “desktop” class OpenGL, necessitating some porting to run most of the software I use on my laptop.

Working on these was why I purchased the Odroid U3. In hindsight, I probably should have gotten the XU… (I decided to holdout for the XU2 which I’ve since learned _may_ come out Q3…). With the U3 I underestimated the progress of Lima and the state of the Mali blobs, underestimated how much I’d want OpenCL support, didn’t realize the XU had better upstream/FDT support, and didn’t realize the Cortex A15s have PAE (would have allowed me to natively compile Firefox… except ‘ld’ is still 32-bit so nevermind :P). That being said, for the price and size, I love my U3. I’ve been casually splitting my time between making it “personal use”-able and having it run as a computational node (so I don’t have to keep running my enormous 200W rack mount server). I’m not quite there in both regards. For personal use the two main detractors for me are some quirkiness in the xf86-video-fbturbo driver (it occasionally doesn’t properly dirty a region and this can lead to some display corruption), and the lack of GLES support in applications. The GLES is mostly just taking the time to recompile packages with GLES support and without glx, and then making sure they properly find the Mali blobs. However, the show stopper is Avogadro.

The newest release of Avogadro needed a small patch to compile on ARM, and it runs with software glx support. However, if I wished to retain my sanity I’d need to get GLES acceleration. I haven’t made much progress in this regard other than learning Qt seems to have good GLES support, and good OpenGL 2.0 code that uses only VBOs is “GLES compatible,” but I have not gone digging in Avogadro yet to see about either case. The hurdle (aside from time), is the development of Avogadro 2 (a rewrite with a much cleaner design), and my inability to decide which one to devote time to ripping apart.

Overall, the “personal use” aspect of the ARM boards isn’t that hard to manage, thanks to Arch Linux ARM :D With FDT usage becoming more widespread, and emphasis on upstreaming device drivers, it seems as if it’s becoming easier to get good old Linux running on these boards as well.

The computational chemistry packages on the other hand, have been giving me some trouble. In anticipation of my U3, I went ahead and finished my efforts to package Octopus and NWChem for Arch. Octopus was quite happy to compile and pass its test suite with the same package, using the default netlib derived BLAS/LAPACK. NWChem also compiled successfully, however my attempts to actually _run_ NWChem have been unsuccessful and result in something similar to :

MA_verify_allocator_stuff: starting scan ...
stack block 'gai_diag_std:z', handle 1, address (.......):
current right signature (#########) != proper right signature (*******)

(zsh) Segmentation Fault

The segfaulting seems to be a problem with the Global Arrays memory allocation, but nothing I have been trying seems to work (OpenMPI vs MPICH, “internal” ARMCI vs ARMCI-MPI, reducing the -O level, internal vs external BLAS/LAPACK, and all permutations therein). The only other mention of a similar error I’ve seen came from a forum post for compiling on BlueGene/Q, and the solution was to link to the optimized BLAS/LAPACK libraries. However, this is another problem, as I have not been able to get a successful compilation of OpenBLAS on my U3. Granted, ARM support in OpenBLAS is very new, but any permutation of options I attempt gives a library that causes Octopus to fail it’s testsuite (and even attempting to use these with NWChem does not help). My other option is ATLAS, but I have no experience with and have heard compilation/installation is a bear. I’ve also recompiled NWChem so much I just had to take a break from it.

This is when I remembered I want these packages to be able to use the Epiphany chip as well! Octopus has OpenCL support, so I just needed to have it use Coprthr for some easy Epiphany acceleration. However, not wanting to melt my board I’ve been focusing my efforts on getting coprthr to compile on Arch and package well (and test if Octopus can use libocl). Compilation fixes for Arch Linux were not too difficult, but Coprthr is not quite a good team-player when it comes to packaging as it hardcodes a lot of paths, supports a “prefix” but not other options (datadir, etc), and has no DESTDIR support. Back when Epiphany support in Coprthr was announced I had actually added DESTDIR support and made coprthr package very neatly. I never submitted a pull-request. I can’t remember why and I am kicking myself as attempting to merge the changes since then has been more effort than just starting from scratch… Coprthr also has a “test” target, but this tests the software after it’s installed, not ideal for a “check” function in a package. I’ve been patching the testsuite to use the local files, but it’s like peeling an onion and finding more and more stuff to fix/patch/tweak. Hopefully I can finish this before those fans arrive ;]

There’s certainly plenty to do ;] And once my fans arrive I plan to have a lot of fun playing with the eSDK (because OpenCL is kind of a boring way to program the Epiphany ;])

My new setup: One reason why I love DragonFlyBSD

I recently switched to DragonFlyBSD as my main OS, and one of my favorite new things I’ve done has been my mirroring setup.

I currently use mercurial to manage my home directory. It works out pretty well,  I have an alias that automatically stages changes and commits with an automatic message and then push the changes to an SD card I had inserted. This also allowed me to backup my home to our home server and/or update my home on my other PC’s using the ssh:// url in hg push. The added benefit of merging changes from when I switch machines was the main motivation behind this setup. (I was thinking of posting about this setup, but it’s not too different than what pretty much everyone else posts on this subject).

Some downsides? Large binary files use a lot of memory to manage. Before we upgraded the memory in our backup server I was using it all up and making mercurial bail. This caused me to make an untracked directory that is a mess, and I’d have to rsync this along with pushing. I’d also have to prune the tree history when it got too big. For moving between machines this is still the best option I’ve found, but for backups it can be a little cumbersome.

With HAMMER on my laptop and backup server,  I had another option, mirror-copy! I simple made a separate pfs for my home directories on the server and laptop, and called mirror-copy with the remote url. This would be a filesystem level mirror to our server, including the automatic snapshots I already have on the laptop’s pfs (which get automatically pruned).

Some downsides to this approach are the pfs-slaves are read-only, so this is really only good for backup purposes (although, I might try some mount_union tricks for when I log into the server), and this of course only works with systems that use HAMMER (currently only DragonFlyBSD as far as I know).

Wait, why not ZFS or BTRFS? Good question. It never occurred to me to use BTRFS this way when I was running Linux, and I never bothered with ZFS on my FreeBSD machine. Technical/concrete comparisons aside, HAMMER feels really well put together and very cohesive.

Status Update: libva-epiphany-driver

On the off chance someone actually looked at my github page and followed my info here…
I had been crunching getting a libva skeleton driver up and running, my initial hopes were to generate excitement for the parallella kickstarter. After I failed to get it done in time (but it got funded! yay!), I was still crunching to get a working demo up (to keep people excited). However, I hit a brick wall trying to debug my Huffman decoding routine, and quickly lost focus as my research drew my attention away. I had more in-progress work, including functional DRI output, that I hadn’t commited because I was trying to debug that routine.

I keep meaning to go back to it, but I still haven’t had any mental breakthroughs. Therefore, I decided to just go ahead and commit what I had. The skeleton driver works, it just doesn’t do anything :P

For some reason, I had insisted on coding the codecs from scratch (part pride, part licensing, etc…), but now I’m feeling more pragmatic. Therefore, I’ve decided to do a few things:

  1. Use libjpeg-turbo source as a reference and quickly finish up the JPEG decoding routines for the demo (concede pride).
  2. Approach problem differently! (possibly concede licensing).

The”problem” was one I made for myself: libva acts as a hardware mediator between applications and accelerated hardware, and it’s on the hardware that the codecs are implemented. The libva-driver gets requests and handles setting up and communicating with the hardware, shuffling data around and such. I knew I’d have to implement the codecs somehow, but foolishly decided to implement them in the driver.

I’ve recently decided to focus my efforts on porting existing codecs/libraries to utilize Epiphany, then just have the libva-epiphany-driver as a host program that loads the separate programs onto Epiphany. This should have the benefit of reducing my workload, simplifying libva-epiphany-driver, and making it possible to receive the benefits of my porting in non-vaapi applications. And of course, porting existing projects and contributing upstream will be better overall (upstream benefits from wider adoption, I benefit from upstream contributions, etc.).

I decided to start with libjpeg-turbo, as it’d be the simplest to work with (and wouldn’t have to worry about the BSD license).  Hopefully this approach will go much better.


I just backed this kickstarter today for Parallella, and have to say I’m very excited! To me, this is the right thing to do, done the right way. I really hope it succeeds! Not just to get a dev board, but I want to see these chips proliferate and make it easier for people to do heterogeneous parallel programming.

My mind is racing with everything I want to do with it, and everything that is possible :D

Update: 10-16-2012 In their newest project update, the Parallella kickstarter has decided on a “soft relaunch” in order to better sell the platform to a wider audience. I think it’s a great idea. They had asked for people to post about what they plan to do with their dev board in order to post the more exciting/credible ones to the front page in order to (as they said) “WOW” the non-programmers. I had already posted what I plan to do, but thought I’d reproduce my ideas here and elaborate a bit more than I would in a comment thread.

I definitely plan on playing around with computational chemistry packages on my board. I currently use Quantum ESPRESSO and Gaussian on our small cluster in lab, but have been interested in running the open source packages on more accelerated hardware. This is one of the reasons this project caught my attention.
ESPRESSO and Octopus (another comp chem package with similar goals) have preliminary/development branches with CUDA/OpenCL accelerated backends respectively, so getting them to use the Epiphany cores shouldn’t be too difficult. Getting them to run well will be fun to experiment with ;]
Both packages mostly rely on the matrix math backends to offload the work to the GPU, so assuming I get these packages utilizing the Epiphany cores, it should be simple enough to get other computational packages to use Epiphany cores. I would likely start hacking on NWChem (another popular computation chemistry package).
All these packages support some form of cluster programming through MPI in particular, so a Parallella board should be able to be dropped into an existing cluster (or make a small cluster of Parallella boards ;]).

I really don’t think this will be too difficult, and this gives me the chance to have a reasonable computational setup at home that won’t bankrupt me electricity-wise. Exciting!

…LibVA (vaapi) seems a good target as well. Writing a backend for that would allow the use of the cores for video decoding/encoding, and do so without having to mess with the base package. This seems like it might be more effort (it looks like it involves reimplementing any encoding/decoding profiles to expose to libVA), but would allow an easy “drop in” solution that would immediately benefit mplayer/VLC/XMBC. This is a project I’d definitely be willing to work on….

I had this idea when people were discussing “out-of-box” and “media box” potential. I poked around the vaapi repo and realized the Epiphany could easily be a “backend” for it. This seemed like a bit more work, but I was willing to do so in order to generate some more excitement for the board. I actually resolved myself to start writing what I could with the documents they’ve already released. I’m hoping to get a chance to have at least a skeleton project before the relaunch this friday.

The other crazier idea I had was to make an LLVM backend for the Epiphany chips, and use LLVM’s JIT/runtime compilations capabailities to do interesting things like: dynamically enable use of the Epiphany chip if available (much like Apple had done with their OpenGL pipeline to enable software fallbacks for missing hardware features on the Intel GMA), or make t easier to have an optimizer that will translate SIMD calls to relevant Epiphany kernels (which should help accelerate quite a few things). For the latter, I’m sure there is a way to do it in GCC, but my impression is the internals are not as modular, and I’d also loose the dynamic compilation possible with LLVM.

I have other even crazier ideas about what I’d do with dynamic [re]compilation, but I’ll save those for a separate post.

Bash one-liners: deleting unopened files in current directory

Gaussian ’09 can leave behind piles of temp files when things fail. These build up and take up a good chunk of disk-space. Usually I just do the house-cleaning when disk-space gets low, but as some may be in use, I cannot just clean the whole directory. I usually tinker until I get something that will “delete all unopened files”, but I keep forgetting how I did it the time before (and I’ve probably done it differently each time…)
The current incarnation follows. This deletes any unopened files in the current directory:

for file in * ; do lsof $file > /dev/null || rm $file ; done

It’s not perfect, as this one complains when rm tries to delete directories. Testing for directories is unnecessary and annoying. I could silence rm with a 2> /dev/null, but the paranoid-me doesn’t want to quiet all errors.
Any suggestions welcomed. My glob was originally `ls *.*` but thought this excessive, and potentially able to miss files without a suffix.


I’ve been playing/looking with/for libraries for a project I’m thinking about, and got to the problem of font rendering. All signs seemed to point to freetype2, but I was hesitant. I just wanted something lightweight (only rendered fonts), preferably BSD/similarly licensed, but would give me good looking results.

My impression was that the font system under the *nixes+Xorg was libxft working with freetype2, and all those gross XML files were from freetype2. Thus I was worried that I’d have to pull in all that cruft just to render some fonts. However, this is when I actually did some digging and found out libxft is built on fontconfig, (which was responsible for all the XML crap), and that freetype2 is actually a separate and minimalistic library that seems very well put together.

It just seems the right way to do it.

Please don’t learn to use a screwdriver

Let’s face it, things have gotten out of control. There really should be no need for everyone to learn how to use a screwdriver. Most of our PC’s come pre-assembled, and the ones that don’t should only be assembled by people who know what they are doing. From my experience, people using screwdrivers can only contribute more overtightened screws in the world, or poor usage of screws. If my drawer breaks I’m not going to try and fix it myself; I have no idea how drawers works and I should leave the job up to someone who knows what they are doing.

An example, let’s do some reductio ad ridiculum on this.
“If we don’t learn to screw we risk being screwed ourself. Screw or be screwed.” – Douglas Rushkoff

Parody aside, do I believe programming is a life skill everyone should be exposed to? Yes, definitively. Should we all be programmers? Probably not. I think Jeff makes some good points, but I feel his analogy is off. I also think his conclusion is elitist, and personally I support meritocracy over elitism. Let’s employ the programmers who prove themselves. Let’s use the good code, but but also let’s give everyone the tools to identify good code. Let’s give everyone the chance to prove they are good coders. But most of all, let’s help people learn how to solve problems–just like Jeff says in his article. The key point I differ on is I support people learning to code in order to strengthen and supplement their problem solving skills.

How do we solve problems if we don’t learn how to use the tools? What’s wrong with coding your own solutions for your own problems? Even if it’s ugly code, it works for you and that’s all it needs to do. How many of you can honestly say your custom shell scripts are programming gems? (Mine sure aren’t) Do you want to argue that they solved problems that other people are more qualified to solve and we should leave the job up to them?

I also find Jeff’s article aggravating because I see it as further justification of an attitude I despise–especially in my co-workers. They only use software that does everything for them, and if it doesn’t do what they want, they complain: “why didn’t they just make it do XYZ?” “why didn’t they do it this way instead of that way?” “Oh! I’m going to pay hundreds of dollars for this software because it will normalize my graphs for me!” If only they realized that with the proper tools, they could fix it them-damned-selves. They don’t need to be master programmers to normalize a set of graphs; they just need to make a stupid formula (and being chemistry grad students, they damned better know how to mathematically normalize something). Perhaps with some knowledge of variables and assignment, they could make a macro they can apply to data sets, or something! Instead of whining and waiting for the next black box to descend from the higher-ups, they could do it themselves.


Get every new post delivered to your Inbox.