BenchmarkAnything

Generic tools around benchmarking

View the Project on GitHub benchmarkanything

Introduction

BenchmarkAnything consists of a data schema to represent measurement results, tools to work with such data, a data storage to store them, and a query data format to search in such a data store.

Measurement data

The major data abstraction: a key/value like data structure - with a twist.

See the following example, here expressed in YAML:

BenchmarkAnythingData:
  - NAME: example.metric.name1
    VALUE: 12.3
    UNIT: s
    key1: info1
    key2: info2
  - NAME: example.metric.name1
    VALUE: 13.7
    UNIT: s
    key1: some-other-info
    key2: even-more-info
  - NAME: example.yet.another.metric
    VALUE: 87.2
    gerrit_change: 4226
    gerrit_branch: master
        

It starts with hash entry BenchmarkAnythingData so that we can embed and find it inside surrounding other data, and the value of that hash entry is a list of hashes where each such list element represents a single data point of some measurement. Each such data point has a NAME for the metric it belongs to and a VALUE. These are the only two mandatory keys. Additionally there can be any other key/value pairs.

By convention UPPERCASED keys are reserved for special meaning, like here the UNIT (the data storage augments it with some more, like a unique VALUE_ID and a CREATED timestamp).

It is your choice how to split semantic meaning between the metric NAME and describing additional key/value pairs. It mostly depends on how you plan to query them later. Usually there is a balance between the complexity of metric names to express what the benchmark was about and does know by "itself", like

and the additional keys to describe the environment in which the benchmark was executed and usually are contributed from some outer context that the benchmark itself does not know about.

In the above example that detail .20.10.10M is already a hint about the cargo. You could argue to make this an additional key/value information. Deciding this in a way that makes your later evaluation more useful, is the hard part. :-)

Query language

The query language is essentially a list of tuples consisting of a comparison operator, the key, and the searched value, plus some surrounding "influencers".

Synopsis, embedded in actual curl lines:

You should also try curl ... | dpath -o flat / for line based awk-ward output.

See more details here.

Embedding benchmark results in TAP

The Tapper infrastructure has the whole BenchmarkAnything subsystem embedded. It implicitely extracts and stores BenchmarkAnythingData from TAP results into a database and provides web-form based simple dashboards.

It is also possible to embed BenchmarkAnything data in TAP output. The key to this is that TAP allows embedded YAML between some begin/end markers (--- and ..., .i.e., 3 dashes and 3 dots), like this:

1..3
ok - some description
ok - measurement data
  ---
  BenchmarkAnythingData:
    - NAME:       bmk.sock.send_data.dd.devzero.duration.20.10.1M
      VALUE:      0.956
      UNIT:       s
      cargo_size: 10485760
      magic_enabled: 1
      cpu_pinning: ~
      cpu_number: ~
      virt_provider: linux-kvm
      network_provider: vio_foo
      benchmark_fastmode: 1
  ...
ok - some other descriptio
                  

I.e., you have an anchoring ok line, followed by an embedded begin marker of 3 dashes ---, the actual results introduced with BenchmarkAnythingData followed by an array of results, and an embedded end marker of 3 dots ....

You can have multiple such ok/YAML combinations, or multiple results inside the same single array under BenchmarkAnythingData. All of the single data points will be picked up and stored in the database.

That's it.

Working locally

Install dpath and create a script that outputs your TAP and try to extract your values like this (instead of echo, take your great script):

echo "
1..1
ok - results
  ---
  BenchmarkAnythingData:
   - NAME: some.metric.name
     VALUE: 12.7
   - NAME: some.metric.name
     VALUE: 12.521
   - NAME: some.metric.name
     VALUE: 13.1
   - NAME: some.very.different.metric.name
     VALUE: 14.8
  ...
" | dpath -i tap "//BenchmarkAnythingData/*/NAME[value eq 'some.metric.name']/.."
                  

This is essentially the same what Tapper and the query APIs do behind the scene, all in one go.