SciDB tips

I’m working on a project that will use SciDB for an analysis of MODIS data. This post is a dumping ground for small things that I wish I had known at the start of this process. Most of these are obvious in retrospect, but I was confused and maybe others are as well.

Boolean logic operators

I couldn’t find any reference to Boolean logic operators in the SciDB documentation. It turns out that these are the same as in SQL: and, or, not. For example:

SELECT FROM my_array WHERE a > 0 AND (b = 1 OR c = 2);

List of all SciDB functions

The SciDB user manual breaks functions into categories, but sometimes it’s more useful to see a list of all the available functions. This is easy to get using the list operator:

AFL% list('functions');

Use list('operators'); to see available operators.

Use null with the between operator

The between operator selects a chunk of an array by restricting one or more dimensions to a range. But you have to specify a high and low value for each dimension. I started out by copying the min and max possible values for dimensions that I did not want to filter. I later learned that you can pass null for dimensions that you don’t want to filter (the manual doesn’t seem to mention this). For example, I have an array of vegetation index data from MODIS MOD13Q1 product. The schema of my array is:

mod13q1 <evi:int16> [year  = 2001:2001,  1,      0,
                     month = 1:12,       12,     0,
                     day   = 1:31,       10,     0,
                     i     = 0:23039999, 170000, 0]

This array have four dimensions: year, month, day, and i (the pixel index). If I wanted to extract measurements for January to March in all years I could use this query:

between(mod13q1, null, 1, null, null, null, 3, null, null)

Set the missing reason code

SciDB supports multiple types of missing data. Each null cell can have a missing reason code that indicates why the data is missing (for example, instrument failure, cloud cover, etc.) The SciDB manual explains how to represent the missing data code when loading data into SciDB, but it wasn’t clear how to set the missing data code once data was in a SciDB array. I had large array already loaded into SciDB, and I wanted to replace a certain value with a null value and a missing reason code. Turns out that the function missing can do this. In my case, I wanted to replace the value -1000 with the missing value code 1, which I accomplished with the following query:

UPDATE gimms SET ndvi = missing(1) WHERE ndvi = -1000;
This entry was posted in Data analysis, Databases and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *