Reading CCJL (Compute Canada Job Logs format)


CCJL (Compute Canada Job Logs) is a delimited key-value pair format used to amalgamate logs from legacy clusters and load to the CCDB. It’s still used for some purposes.

The field separator is |, the pair separator is : and it looks like this:

job_id:79065646|rapi:abc-123-aa|start_time:1367378671|submit_time:1367378670|state:done|ccri:haz-605-01|resource_id:sharcnet.redfin|memory_requested:<NULL>|end_time:1367383362|job_type:serial|scheduling_class:\N|suspension_history:<NULL>|ncores:1
job_id:79065645|rapi:abc-123-aa|start_time:1367378512|submit_time:1367378512|state:done|ccri:haz-605-01|resource_id:sharcnet.redfin|memory_requested:<NULL>|end_time:1367378513|job_type:serial|scheduling_class:\N|suspension_history:<NULL>|ncores:1
job_id:79069076|rapi:abc-123-aa|start_time:1367426895|submit_time:1367426894|state:done|ccri:haz-605-01|resource_id:sharcnet.redfin|memory_requested:<NULL>|end_time:1367426896|job_type:serial|scheduling_class:\N|suspension_history:<NULL>|ncores:1
job_id:79069223|rapi:bcd-234-aa|start_time:1367430921|submit_time:1367430921|state:done|ccri:shr-354-01|resource_id:sharcnet.redfin|memory_requested:<NULL>|end_time:1367430932|job_type:mpi|scheduling_class:\N|suspension_history:<NULL>|ncores:24

It is one of the native formats of the structured data tool Miller. As a simple example, this is how you would convert CCJL to JSON:

$ mlr --idkvp --ifs='|' --ips=':' --ojson cat file.ccjl

…which gives this output:

[
{
  "job_id": 79065646,
  "rapi": "abc-123-aa",
  "start_time": 1367378671,
  "submit_time": 1367378670,
  "state": "done",
  "ccri": "haz-605-01",
  "resource_id": "sharcnet.redfin",
  "memory_requested": "<NULL>",
  "end_time": 1367383362,
  "job_type": "serial",
  "scheduling_class": "\\N",
  "suspension_history": "<NULL>",
  "ncores": 1
},
{
  "job_id": 79065645,
  "rapi": "abc-123-aa",
  "start_time": 1367378512,
  "submit_time": 1367378512,
  "state": "done",
  "ccri": "haz-605-01",
  "resource_id": "sharcnet.redfin",
  "memory_requested": "<NULL>",
  "end_time": 1367378513,
  "job_type": "serial",
  "scheduling_class": "\\N",
  "suspension_history": "<NULL>",
  "ncores": 1
},
{
  "job_id": 79069076,
  "rapi": "abc-123-aa",
  "start_time": 1367426895,
  "submit_time": 1367426894,
  "state": "done",
  "ccri": "haz-605-01",
  "resource_id": "sharcnet.redfin",
  "memory_requested": "<NULL>",
  "end_time": 1367426896,
  "job_type": "serial",
  "scheduling_class": "\\N",
  "suspension_history": "<NULL>",
  "ncores": 1
},
{
  "job_id": 79069223,
  "rapi": "bcd-234-aa",
  "start_time": 1367430921,
  "submit_time": 1367430921,
  "state": "done",
  "ccri": "shr-354-01",
  "resource_id": "sharcnet.redfin",
  "memory_requested": "<NULL>",
  "end_time": 1367430932,
  "job_type": "mpi",
  "scheduling_class": "\\N",
  "suspension_history": "<NULL>",
  "ncores": 24
}
]

Miller can do much fancier things, for example here’s a summary of job types:

$ mlr --idkvp --ifs='|' --ips=':' --opprint stats1 -a count -g job_type \
   -f  file.ccjl
job_type ccri_count
serial   3
mpi      1

See also