On technology transfer - Formal Methods and Tools

On technology transfer: experience from the CARP project... and beyond Anton Lokhmotov, dividiti Lorentz Center workshop on verification of concurrent and distributed software, Leiden, 17 September 2015

A view from @LegoAcademics

“The @LegoAcademics wonder how much time Newton spent courting strategic industry commercialization partners.”

My view: courting industry is good! “The @LegoAcademics wonder how much time Newton spent courting strategic industry commercialization partners.” ● ● ● ●

Must be a frustrated response to research funding guidelines. Indeed, not all kinds of research can be commercialised. But as this talk should hopefully demonstrate, those kinds than can would do well from collaborating with industry. (After all, even Newton, according to a popular account, had a contact with… )

My background ● ● ●

PhD (Cambridge, 2008); MSc (Moscow PhysTech, 2004). (2008-2009) Post-doctoral research associate at Imperial College London. (2010-2015) Technical lead and manager for the GPU compute compilers for the ARM Mali GPU series: ○ Industry-first OpenCL 1.1 Full Profile implementation for mobile. ○ Industry-first RenderScript GPU support. ○ Innovation and technology transfer champion: ■ GPUVerify integrated into Mali Graphics Debugger (in this talk). ■ PENCIL, a platform-neutral compute intermediate language. ■ PENCIL-to-OpenCL polyhedral compiler. ■ BLAS for Mali, an autotuned VOBLA-to-PENCIL implementation. ■ Collective Knowledge validated for performance analysis and predictive analytics (flagship activity of dividiti).

Alastair F. Donaldson ● ● ● ●

●

Advisor in Software Reliability, dividiti (since 2015). Senior Lecturer / Leader of the Multicore Programming Group (http: //multicore.doc.ic.ac.uk), Imperial College London (since 2011). Coordinator, EU CARP project (http://carpproject.eu, 2011-2015). ○ Involving ARM, INRIA, Codeplay, Monoidics (acquired by Facebook). GPUVerify (with Multicore Programming Group + Shaz Qadeer, Microsoft) ○ Static analysis tool for GPU kernels (OOPSLA’12, OOPSLA’13, POPL’ 14, CAV’14). ○ Integrated with the ARM Mali Graphics Debugger (since v2.1). CLsmith (with Christopher Lidbury, Andrei Lascu and Nathan Chong) ○ Random test generator for OpenCL compilers (PLDI’15). ○ Found 50+ compiler bugs in OpenCL drivers from Intel, AMD, NVIDIA, Altera, ARM.

Specialisation is key to efficiency High performance systems support innovative applications ●

Energy efficiency has become as important as execution speed.

Heterogeneous hardware ● ●

Clusters of general-purpose processors (e.g. ARM big.LITTLE CPUs). Special-purpose accelerators (e.g. ARM Mali GPUs).

Software? ●

(this talk)

Compute APIs: 2007-2014 API

CUDA

OpenCL

DirectCompute

RenderScript

Originator

NVIDIA

Khronos (Apple)

Microsoft

Google

Year

2007

2008

2009

2011

Area

HPC, desktop

HPC, desktop, mobile, embedded

Desktop

Mobile

OS

Windows, Linux, MacOS

Windows, Linux, MacOS (10.6+)

Windows (Vista+)

Android (3.0+)

Devices

GPUs (NVIDIA)

CPUs, GPUs, etc.

GPUs (desktop)

CPUs, GPUs

Work unit

Kernel

Kernel

Compute shader

Compute script

Language

CUDA C/C++

OpenCL C

HLSL

Script C

Form

Source, PTX

Source, SPIR

Source, bytecode

LLVM bitcode

Compute APIs: 2014-? API

CUDA, OpenCL, DirectCompute, C++ AMP...

Metal

Vulkan

Apple

Khronos

2014

2015

Area

Mobile, desktop

HPC?, desktop, mobile, embedded

OS

iOS (8.0+), OS X (10.11+)

Linux, Windows, Android (5.?)

Devices

CPUs, GPUs

CPUs, GPUs, etc.

Originator Year

2007-2011

Work unit Language Form

C++ based SPIR-V

Open-C-L-ephant (see SYCL&SPIR?)

OpenCL: programming model ● The host (e.g. CPU) manages a collection of devices (e. g. 2 GPUs) using OpenCL API calls ● Host code is compiled using a system compiler (e.g. GCC) ● Code to be executed on a device is compiled using the device’s compiler invoked at runtime ● Device code executes as a set of work-items, or threads ● Key idea – recast data-parallel loops as kernel invocations

OpenCL: kernel /* Vector addition in C */

/* Vector addition in OpenCL C */

void vadd (size_t n, float * restrict c, const float * restrict a, const float * restrict b) { for (size_t k = 0; k < n; ++k) c[k] = a[k] + b[k]; }

kernel void vadd ( global float * restrict c, const global float * restrict a, const global float * restrict b) { size_t k = get_global_id(0); c[k] = a[k] + b[k]; }

/* executed sequentially */

/* executed over n work-items */

GPUVerify ● http://multicore.doc.ic.ac.uk/tools/GPUVerify/ ● Tool for formal static analysis of GPU kernels written in OpenCL and CUDA: ○ ○

Can prove that kernels are free from certain defect types, including data races and barrier divergence. Can find real bugs.

● Recognised by industry: ○

Integrated with the ARM Mali Graphics Debugger (since v2.1) http://community.arm.com/groups/arm-mali-graphics/blog/2015/04/14/debugging-openclapplications-with-mali-graphics-debugger-v21-and-gpuverify

○

Imagination Technologies’ third-party showcase http://community.imgtec.com/showcase/single-showcase/gpuverify

GPUVerify: racy kernel INPUT

OUTPUT

//--local_size=1024 //-global_size=1024 kernel void add_neighbour ( local int *A, size_t offset) { size_t tid = get_local_id(0); A[tid] = A[tid]+A[tid+offset]; }

Verification failed. gpuverify-source-hfjW48.cl: error: possible read-write race on A[1]: Write by work item 1 in work group 0, gpuverify-sourcehfjW48.cl:13:3: A[tid] += A[tid + offset]; Read by work item 0 in work group 0, gpuverify-source-hfjW48. cl:13:3: A[tid] += A[tid + offset]; Bitwise values of parameters of 'add_neighbour': offset = 1 GPUVerify kernel analyser finished with 0 verified, 1 error

GPUVerify: fixed kernel INPUT

OUTPUT

//--local_size=1024 //-global_size=1024 kernel void add_neighbour ( local int *A, size_t offset) { size_t tid = get_local_id(0); int t = A[tid]+A[tid+offset]; barrier( CLK_LOCAL_MEM_FENCE); A[tid] = t; }

GPUVerify kernel analyser finished with 1 verified, 0 errors Verified: gpuverify-source-2I6div.cl - no data races within work groups - no data races between work groups - no barrier divergence - no assertion failures (but absolutely no warranty provided)

GPUVerify: latent atomic bug ● Code tested in 2010 on an NVIDIA GPU hang on a GPU from a different vendor in 2014… ● GPUVerify complained about the bug straightaway! ● (I can hardly spot it now, even though I know it’s there...) void atomic_add_float(volatile LOCAL float* source, float operand) { volatile LOCAL int* sourceAsInt = (volatile LOCAL int*) source; int oldVal; int oldVal_2 = 0; do { oldVal = oldVal_2; float newVal = as_float(oldVal) + operand; oldVal_2 = atomic_cmpxchg(sourceAsInt, oldVal, as_int(newVal)); } while (oldVal_2 != oldVal); }

ARM Mali Graphics Debugger :) Mali Graphics Debugger supports integration with GPUVerify since v2.1: ● ● ●

Traces user OpenCL application. Intercepts kernels and their launch parameters. Calls GPUVerify executable and returns its output to the user.

:( Only supports Mali GPUs. ● ● ●

A point of competitive advantage for ARM. But can we consider technology transfer complete?.. Paraphrasing Leonardo da Vinci: “Technology transfer is never complete, only abandoned”. (He was talking about art.)

Two sides of OpenCL reliability coin ● ● ● ●

GPUVerify is a tool for finding bugs in user programs. But user programs call into vendor drivers. Vendor drivers contain bugs too. In particular, in-the-driver-compilers (like any compilers) are bug-prone. ● (No chain is stronger than its weakest link.)

OpenCL compiler bugs: effects Embarrassing for implementors, debilitating for users ● Compiler fails due to internal error ●

Whole application crashes.

● Compiler fails to generate code ●

Specification is misinterpreted (by implementors or by users).

● Compiler silently generates incorrect code ● ● ● ● ● ●

Machine crashes. Display freezes (GPU). Whole application crashes. Program crashes. Program produces incorrect results. Program is inexcusably slow (performance bugs).

OpenCL compiler bugs: experience ● Can affect hardware validation (high risk in IP business). ● Can appear and disappear without any obvious changes (hint: think of changes in third-party IP e.g. LLVM). ● Can go undiscovered for years. ● Can lead to a backlash from users, sometimes publicly. ● Can result in “customer support cases from hell”.

On compiler bugs "Any compiler, for example, is going to have at least “pathological” programs which it will not compile correctly. What is pathological, however, depends to some extent on your point of view. If it happens in your program, you hardly classify it as pathological, even though thousands of other users have never encountered the bug." -- Gerald M. Weinberg, “The Psychology of Computer Programming” (1971), p. 19

Bayesian perspective on testing : compiler has no bugs (“no bugs”) : compiler passes all tests (“tests pass”) : (prior) probability of “no bugs” : probability of “tests pass” : probability of “tests pass” given “no bugs” : (posterior) probability of “no bugs” given “tests pass”

Bayesian perspective on testing

Given that our tests pass, our posterior belief that our compiler has no bugs depends on: 1) our prior belief in this, and 2) the probability that our tests pass even though our compiler has bugs (the quality of our tests).

Bayesian perspective on testing The quality of a test suite is subjective, say: : average test suite (conservative) : amazing test suite : appalling test suite

Bayesian perspective on testing

The key takeaway from the graph For complex software like compilers, your tests must be “amazing” to catch bugs, even after trivial changes. OpenCL compiler test sources ● ● ● ● ●

Internal tests (frontend, backend, built-in function library). Conformance tests (compiler tests are easy to pass if internal tests are thorough). Benchmark suites (open source, commercial). Real-world applications. Generated tests (in this talk).

CLsmith ~ Csmith for OpenCL ● Similar to Csmith (PLDI’11), generates code having no undefined or implementation-defined behavior. ● Additionally covers vector operations, work-item functions, barriers, atomic sections, atomic reductions. ● Random differential testing (RDT) ○ ○ ○ ○ ○

Generate a random test program. Compile the program using two or more different compilers or the same compiler with different flags (e.g. -cl-opt-disable off/on). Execute the program, collecting output results. By construction, the results should match. If they don’t, one of the compilers miscompiles the program.

CLsmith: pros and cons :-) Effective tool for discovering subtle bugs. :-) Checking the outputs is better than ensuring no crashes. :-( Manual reduction of test cases is laborious: CL-reduce ~ C-reduce (PLDI’12) for OpenCL is work-in-progress. :-( Generator can keep hammering the same bug: when comparing two compilers, a higher percentage of test failures does not mean more bugs. :-( Not all bugs are equal: a bug inducing a high percentage of test failures, may never occur in real programs.

Summary: Imperial OpenCL tools :-) State-of-the-art in OpenCL testing. :-) Strong pragmatic focus. :-) Permissive licensing (2-clause BSD). :-( As any research tools, somewhat patchy and flaky. :-) Open to collaboration and contributions.

Making testing practical ● Making generated tests easy to run and to replay. ● Aggregating results from multiple platforms - essential for the CLsmith approach. ● Managing multiple test versions (e.g. generated by CLsmith and reduced by CL-reduce). ● Crowdsourcing new tests and experiments.

A perfect fit for Collective Knowledge!

Collective Knowledge: a framework for rigorous, reproducible, collaborative R&D with predictive analytics

Collective Knowledge: a framework for rigorous, reproducible, collaborative R&D with predictive analytics

● ●

cknowledge.org/repo github.com/ctuning/ck

On compiler bugs triage "...The producer of the compiler must make some evaluation of the errors on the basis of the number of users who encounter them and how much cost they incur. This is not always done scientifically. Indeed, it often amounts to an evaluation of who shouts the loudest, or who writes to the highest executive..." -- Gerald M. Weinberg, “The Psychology of Computer Programming” (1971), p. 19

Exciting future work ● Analysing results to classify failures and to determine relative importance of bugs. ● Pinpointing where a failure occurs in the compiler? ● Checking whether a user has hit a known bug?

Also a perfect fit for Collective Knowledge!

Using decision trees to triage bugs ● Design space exploration of SLAMBench parameters on a new platform resulted in 24 failures out of 100 samples. ● 22 failures (the right hand child of the root) can be readily explained by out of memory errors (“v > 384” is too high for this platform). ● 2 failures warrant a closer look: ~10x reduction.

My definition of transfer to industry ● Taking your research that you keen to see used and developed further and… ● Giving it a kick into the real world… ● While not really expecting to gain financially. ● (Otherwise, you should ask yourself, why not start your own company?)

My two rules of technology transfer 1. Corroborate. “In industry, we ignore the evaluation in academic papers. It is often wrong and always irrelevant.” — Head of a major industrial lab. (Quoted in: Jan Vitek, Tomas Kalibera. EMSOFT’11.)

2. Collaborate. “Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.” — Howard H. Aiken. (Attributed to)

1. Corroborate ● Accept that rigorous evaluation and reproducibility are essential for research excellence. ○ ○

Assume yours ideas are worthless until they are properly evaluated. Humble workshops that follow this agenda are more likely to be attended, checked for papers and even sponsored by industry.

● Maintain integrity. ○

Be vocal of limitations of your technology as well as of its benefits.

On technology transfer "The concreteness of the idea is essential. In the words of MIT’s David D. Clark: ‘One artifact is worth a thousand papers.’ The prototype not only demonstrates feasibility, it also demonstrates a potential new product in an application context. If the technology embodying the idea cannot be demonstrated at the outset, any company founded to exploit the idea is likely to find itself doing a lot of fundamental research." -- C. Gordon Bell, “High-tech ventures” (1991), p. 107

2.A Collaborate ● ● ● ●

Earn trust and develop relationships with industry. Create a push from academia and a pull into industry. Just like Alastair Donaldson and I have done. The following quote describes our relationship spot on!

On collaboration "Collaboration is the natural by-product of leaders who are: ● Passionately curious - who crave new insights and suspect that others have them. ● Modestly confident - who can bounce ideas off brilliant collaborators, without turning it into a competition. ● Mildly obsessed - who care more about the collective mission than about how achieving it will benefit their personal fortunes.

… and may be the only leadership mode that produces breakthrough results.” -- John Abele, “Bringing minds together” (HBR, 2011)

2.B Collaborate ● Embrace proper dissemination and exploitation. ○ ○

○ ○

Resist imposing restrictions on early stage technology. Think about your collaborators’ potential needs to: ■ Use the software “as-is”. ■ Modify the software (e.g. fix bugs or enhance). ■ Distribute the software or any derivative on its own or in a product. ■ Contribute to the software (e.g. patches). Use permissive licences for your own components. Prefer third-party components with permissive licenses (e.g. BSD, MIT, Apache 2.0).

BSD-3-clause.txt Copyright (c) [YEAR] [OWNER] Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

MIT.txt Copyright (c) [YEAR] [OWNER] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

diff BSD-3-clause.txt MIT.txt • License: Use the software • Commercial use: Permitted • Restrictions: • Keep notice provisions • You cannot use the name of the licensor to endorse products • Patents: No explicit patent license on the inbound side, but one is implied on the outbound side. However, a patent license would only be triggered if the BSD-licensed software is distributed by the licensee. • Confidentiality and security: None • Warranties: None • Consequential losses: Disclaimed • Licensor limitation of liability: Disclaimed • Licensee limitation of liability: Silent • Licensee subcontractor provisions: OK • Licensee affiliates/subsidiaries: OK • Choice of law: None

• License: Use the software • Commercial use: Permitted • Restrictions: • Keep notice provisions

• Patents: No explicit patent license on the inbound side, but one is implied on the outbound side. However, a patent license would only be triggered if the MIT-licensed software is distributed by the licensee. • Confidentiality and security: None • Warranties: None • Consequential losses: Disclaimed • Licensor limitation of liability: Disclaimed • Licensee limitation of liability: Silent • Licensee subcontractor provisions: OK • Licensee affiliates/subsidiaries: OK • Choice of law: None

Case study 1: LLVM ● (2000) Chris Lattner starts working on LLVM at UIUC. ● (2003) LLVM is released under a BSD-style licence. ● (2005) Apple hire Chris Lattner to bring LLVM to production quality for use in their products. ● (2007) Apple stop contributing to GCC when its licence changes to GPLv3 having copyleft /anti-tivoisation clauses. ● (Now) LLVM has many active contributors from ARM, Google, Qualcomm, Intel and a great many other companies, while GCC is losing favour...

Case study 2: GPUVerify ● Most components under permissive licenses (Microsoft Public License, MIT, BSD). ● Support for different theorem provers (third-party components): ○ ○

Z3 under Microsoft Research License that allows only non-commercial use – a no-go even for an evaluation by a company. CVC4 under modified BSD – was essential for enabling collaboration between Imperial and ARM.

● Still too many different components under different licenses: GPUVerify is a separate download to Mali Graphics Debugger. (I say integrated “with”, not “into”.)

Case study 3: Collective Knowledge Brainchild of Grigori Fursin (http://fursin.net): ● (2007-2009) cTuning-1 (FP6 MILEPOST): GPLv2. ● (2010-2011) cTuning-2: Intel proprietary. ● (2012-2013) cTuning-3 (Collective Mind): BSD. ○

Again, having the software released under a permissive license was important for enabling collaboration between INRIA and ARM.

● (2014-now) cTuning-4 (Collective Knowledge): BSD. ○ ○

For a fledgling startup, giving away its core software for free, with no strings attached, would normally be considered foolhardy... But we believe it is essential in our crusade for reproducible and collaborative R&D!

Do join us in our crusade for reproducible and collaborative R&D!

Acknowledgements HiPEAC CARP (FP7) MILEPOST (FP6) TETRACOM (FP7)

Questions?

Thank you! [email protected]