Treeherder loading data from Pulse

As our infrastructure is moving away from BuildBot and toward Task Cluster, we need to update the way in which job and build data gets to Treeherder.  This is an upcoming feature that will roll out over the next few weeks.  As with any feature that requires synchronicity between teams, I expect we will have to iterate on this as we move forward.  But this is the basic layout of how it will work.

As of now, build and test data gets to Treeherder 1 of 2 ways:

  1. BuildBot: We Extract, Translate and Load the JSON files pending.js, running.js and builds4hr.js and persist that information in the Treeherder database.
  2. Non-BuildBot projects (Task Cluster, Autophone, etc): those projects must post their data via our REST web API interfact.  This means, to get them into Production and Stage, they must be posted to each, independently.

What this also means is that, for method #2, on my local machine, I can’t ingest any data from Task Cluster or the other projects that post to Treeherder Stage/Production.

The Treeherder team has worked with Jonas Jensen and Greg Arndt from Task Cluster to come up with a better solution using Pulse Exchanges.  Below is a diagram of how this will work:

Treeherder Pulse-4

In a nutshell:

  1. Treeherder provides a JSON Schema defining exactly what a job can/must look like.
  2. An external application (I’m looking at you, Task Cluster) will create a Pulse Exchange via Pulse Guardian.
  3. The external application will write a bug or create a pull request against Treeherder to update the “Exchange Config” in “base.py“.  This config specifies their exchange name(s) and which repositories they will post jobs for.
  4. Treeherder creates Pulse Queues to listen to all registered Pulse Exchanges.
  5. The external application (I’m looking at you, fictional Oscillation Overthruster) will start posting jobs to their exchange.
  6. Treeherder reads jobs with matching routing keys from Pulse, queues them asynchronously and writes them to the Treeherder database.

Et Voila!  Your dish is served piping hot.

The Exchange Config is specified as JSON like this in Treeherder’s “base.py“:

[
    {
        "name": "exchange/taskcluster-treeherder/jobs",
        "projects": [
            'mozilla-central',
            'mozilla-inbound'
            # other repos TC can submit to
        ],
        "destinations": [
            'production'
            'staging'
        ]
    },
    {
        "name": "exchange/treeherder-test/jobs",
        "projects": [
            'mozilla-inbound'
        ],
        "destinations": [
            'production'
            'staging'
        ]
    
    }
    ... other CI systems
])

Once the feature has landed on our “master“ branch, you can read more about it in the Treeherder Documentation.  There you will find more information about testing this in a local Treeherder Vagrant environment.

Advertisements

About cheshirecam

I'm a web developer who lives in Bend, Oregon with my wife Dakota and 2 little girls Isabella and Scarlett.
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Treeherder loading data from Pulse

  1. Pingback: Engineering Productivity Update, Oct 1, 2015 | JGriffin's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s