Skip to content

ARM Boards and Me

April 29, 2014

After a long time waiting and hoping, I finally received my Parallella board! …to be reminded I didn’t get the accessory pack and that it runs as hot as a stove (damnable Zynq chip…). I was able to boot it up on USB power, and keep it cool with a box fan I had nearby, but of course this is not workable. I ordered power supplies and small 5V fans from Jameco, but they will not be here until next week >:/ In the meantime, I thought I should write down what I’ve been doing so far in my quest to use ARM boards for my personal and professional needs.

For ARM boards in general, ideally I would like to run Octopus and NWChem (two of my favorite computational packages), and also to have them use an ARM optimized BLAS/LAPACK implementation. With regard to graphics, the mobile GPUs in ARM SoCs all (except for Nvidia’s newer K1 I think) implement OpenGL ES as opposed to full “desktop” class OpenGL, necessitating some porting to run most of the software I use on my laptop.

Working on these was why I purchased the Odroid U3. In hindsight, I probably should have gotten the XU… (I decided to holdout for the XU2 which I’ve since learned _may_ come out Q3…). With the U3 I underestimated the progress of Lima and the state of the Mali blobs, underestimated how much I’d want OpenCL support, didn’t realize the XU had better upstream/FDT support, and didn’t realize the Cortex A15s have PAE (would have allowed me to natively compile Firefox… except ‘ld’ is still 32-bit so nevermind :P). That being said, for the price and size, I love my U3. I’ve been casually splitting my time between making it “personal use”-able and having it run as a computational node (so I don’t have to keep running my enormous 200W rack mount server). I’m not quite there in both regards. For personal use the two main detractors for me are some quirkiness in the xf86-video-fbturbo driver (it occasionally doesn’t properly dirty a region and this can lead to some display corruption), and the lack of GLES support in applications. The GLES is mostly just taking the time to recompile packages with GLES support and without glx, and then making sure they properly find the Mali blobs. However, the show stopper is Avogadro.

The newest release of Avogadro needed a small patch to compile on ARM, and it runs with software glx support. However, if I wished to retain my sanity I’d need to get GLES acceleration. I haven’t made much progress in this regard other than learning Qt seems to have good GLES support, and good OpenGL 2.0 code that uses only VBOs is “GLES compatible,” but I have not gone digging in Avogadro yet to see about either case. The hurdle (aside from time), is the development of Avogadro 2 (a rewrite with a much cleaner design), and my inability to decide which one to devote time to ripping apart.

Overall, the “personal use” aspect of the ARM boards isn’t that hard to manage, thanks to Arch Linux ARM :D With FDT usage becoming more widespread, and emphasis on upstreaming device drivers, it seems as if it’s becoming easier to get good old Linux running on these boards as well.

The computational chemistry packages on the other hand, have been giving me some trouble. In anticipation of my U3, I went ahead and finished my efforts to package Octopus and NWChem for Arch. Octopus was quite happy to compile and pass its test suite with the same package, using the default netlib derived BLAS/LAPACK. NWChem also compiled successfully, however my attempts to actually _run_ NWChem have been unsuccessful and result in something similar to :

MA_verify_allocator_stuff: starting scan ...
stack block 'gai_diag_std:z', handle 1, address (.......):
current right signature (#########) != proper right signature (*******)

(zsh) Segmentation Fault

The segfaulting seems to be a problem with the Global Arrays memory allocation, but nothing I have been trying seems to work (OpenMPI vs MPICH, “internal” ARMCI vs ARMCI-MPI, reducing the -O level, internal vs external BLAS/LAPACK, and all permutations therein). The only other mention of a similar error I’ve seen came from a forum post for compiling on BlueGene/Q, and the solution was to link to the optimized BLAS/LAPACK libraries. However, this is another problem, as I have not been able to get a successful compilation of OpenBLAS on my U3. Granted, ARM support in OpenBLAS is very new, but any permutation of options I attempt gives a library that causes Octopus to fail it’s testsuite (and even attempting to use these with NWChem does not help). My other option is ATLAS, but I have no experience with and have heard compilation/installation is a bear. I’ve also recompiled NWChem so much I just had to take a break from it.

This is when I remembered I want these packages to be able to use the Epiphany chip as well! Octopus has OpenCL support, so I just needed to have it use Coprthr for some easy Epiphany acceleration. However, not wanting to melt my board I’ve been focusing my efforts on getting coprthr to compile on Arch and package well (and test if Octopus can use libocl). Compilation fixes for Arch Linux were not too difficult, but Coprthr is not quite a good team-player when it comes to packaging as it hardcodes a lot of paths, supports a “prefix” but not other options (datadir, etc), and has no DESTDIR support. Back when Epiphany support in Coprthr was announced I had actually added DESTDIR support and made coprthr package very neatly. I never submitted a pull-request. I can’t remember why and I am kicking myself as attempting to merge the changes since then has been more effort than just starting from scratch… Coprthr also has a “test” target, but this tests the software after it’s installed, not ideal for a “check” function in a package. I’ve been patching the testsuite to use the local files, but it’s like peeling an onion and finding more and more stuff to fix/patch/tweak. Hopefully I can finish this before those fans arrive ;]

There’s certainly plenty to do ;] And once my fans arrive I plan to have a lot of fun playing with the eSDK (because OpenCL is kind of a boring way to program the Epiphany ;])

From → Uncategorized

  1. stefan permalink

    Did you ever get NWchem working on the Parallella? If so how’s the performance? I’m thinking of doing something similar with some computational packages. Thanks

    • Sadly no–I still have not been able to get NWChem to work on any ARM-based board (under Arch Linux). I keep coming back to this, and I have figured out it has something to do with the guards on the MPI allocated memory (it fails to set the second guard value for some reason)–this seems to be MPI and GA implementation independent though. I’m still picking this apart to figure out what exactly is going on.

      …the sad part is, even if I could get it to work, it would only run on the Zynq, and wouldn’t leverage the Epiphany chip. NWChem does have GPGPU capabilities, but only through CUDA. In order to get it use the Epiphany chip, I would need to port a good chunk of stuff–either to OpenCL or add in custom Epiphany accelerators (this is intimidating, I’m already overwhelmed picking about the GA portions).

      I held out hopes for Octopus, as it uses OpenCL for it’s GPGPU, and should have be super easy to have it use the Epiphany chip. However, all this has uncovered for me is just how non-conformant coprthr is as an OpenCL implementation…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s