All the sessions from Transform 2021 are available on-demand now. Watch now.
OpenAI today released Triton, an open source, Python-like programming language that enables researchers to write highly efficient GPU code for AI workloads. Triton makes it possible to reach peak hardware performance with relatively little effort, OpenAI claims, producing code on par with what an expert could achieve in as few as 25 lines.
Deep neural networks have emerged as an important type of AI model, capable of achieving state-of-the-art performance across natural language processing, computer vision, and other domains. The strength of these models lies in their hierarchical structure, which generates a large amount of highly parallelizable work well-suited for multicore hardware like GPUs. Frameworks for general-purpose GPU computing such as CUDA and OpenCL have made the development of high-performance programs easier in recent years. Yet, GPUs remain especially challenging to optimize, in part because their architectures rapidly evolve.
Domain-specific languages and compilers have emerged to address the problem, but these systems tend to be less flexible and slower than the best handwritten compute kernels available in libraries like cuBLAS, cuDNN or TensorRT. Reasoning about all these factors can be challenging even for seasoned programmers. The purpose of Triton, then, is to automate these optimizations, so that developers can focus on the high-level logic of their code.
“Novel research ideas in the field of deep learning are generally implemented using a combination of native framework operators … [W]riting specialized GPU kernels [can improve performance,] but [is often] surprisingly difficult due to the many intricacies of GPU programming. And although a variety of systems have recently emerged to make this process easier, we have found them to be either too verbose, lack flexibility, generate code noticeably slower than our hand-tuned baselines,” Philippe Tillet, Triton’s original creator, who now works at OpenAI as a member of the technical staff, wrote in a blog post. “Our researchers have already used [Triton] to produce kernels that are up to 2 times more efficient than equivalent Torch implementations, and we’re excited to work with the community to make GPU programming more accessible to everyone.”
According to OpenAI, Triton — which has its origins in a 2019 paper submitted to the International Workshop on Machine Learning and Programming Languages — simplifies the development of specialized kernels that can be much faster than those in general-purpose libraries. Its compiler simiplifies code and automatically optimizes and parallelizes it, converting it into code for execution on recent Nvidia GPUs. (CPUs and AMD GPUs and platforms other than Linux aren’t currently supported.)
“The main challenge posed by our proposed paradigm is that of work scheduling — i.e., how the work done by each program instance should be partitioned for efficient execution on modern GPUs,” Tillet explains in Triton’s documentation website. “To address this issue, the Triton compiler makes heavy use of block-level data-flow analysis, a technique for scheduling iteration blocks statically based on the control- and data-flow structure of the target program. The resulting system actually works surprisingly well: our compiler manages to apply a broad range of interesting optimization automatically.”
The first stable version of Triton, along with tutorials, is available from the project’s GitHub repository.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
up-to-date information on the subjects of interest to you
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
“This is a really exciting result,” says Edward Cackett, an astronomer at Wayne State University who was not involved with the study. “Although we have seen the signature of x-ray echoes before, until now it has not been possible to separate out the echo that comes from behind the black hole and gets bent around into our line of sight. It will allow for better mapping of how things fall into black holes and how black holes bend the space time around them.”
The release of energy by black holes, sometimes in the form of x-rays, is an absurdly extreme process. And because supermassive black holes release so much energy, they are essentially powerhouses that allow galaxies to grow around them. “If you want to understand how galaxies form, you really need to understand these processes outside the black hole that are able to release these enormous amounts of energy and power, these amazingly bright light sources that we’re studying,” says Dan Wilkins, an astrophysicist at Stanford University and the lead author of the study.
The study focuses on a supermassive black hole at the center of a galaxy called I Zwicky 1 (I Zw 1 for short), around 100 million light-years from Earth. In supermassive black holes like I Zw 1’s, large amounts of gas fall toward the center (the event horizon, which is basically the point of no return) and tend to flatten out into a disk. Above the black hole, a confluence of supercharged particles and magnetic field activity results in the production of high-energy x-rays.
Some of these x-rays are shining straight at us, and we can observe them normally, using telescopes. But some of them also shine down toward the flat disk of gas and will reflect off it. I Zw 1 black hole’s rotation is slowing down at a higher rate than that seen in most supermassive black holes, which causes surrounding gas and dust to fall in more easily and feed the black hole from multiple directions. This, in turn, leads to greater x-ray emissions, which is why Wilkins and his team were especially interested.
While Wilkins and his team were observing this black hole, they noticed that the corona appeared to be “flashing.” These flashes, caused by x-ray pulses reflecting off the massive disk of gas, were coming from behind the black hole’s shadow—a place that is normally hidden from view. But because the black hole bends the space around it, the x-ray reflections are also bent around it, which means we can spot them.
The signals were found using two different space-based telescopes optimized to detect x-rays in space: NuSTAR, which is run by NASA, and XMM-Newton, which is run by the European Space Agency.
The biggest implication of the new findings is that they confirm what Albert Einstein predicted in 1963 as part of his theory of general relativity—the way light ought to bend around gargantuan objects like supermassive black holes.
“It’s the first time we really see the direct signature of the way light bends all the way behind the black hole into our line of sight, because of the way black hole warps space around itself,” says Wilkins.
“While this observation doesn’t change our general picture of black hole accretion, it is a nice confirmation that general relativity is at play in these systems,” says Erin Kara, an astrophysicist at MIT who was not involved with the study.
Despite the name, supermassive black holes are so far away that they really just look like single points of light, even with state-of-the-art instruments. It’s not going to be possible to take images of all of them the way scientists used the Event Horizon Telescope to capture the shadow of a supermassive black hole in galaxy M87.
So although it’s early, Wilkins and his team are hopeful that detecting and studying more of these x-ray echoes from behind the bend could help us create partial or even full pictures of distant supermassive black holes. In turn, that could help them unlock some big mysteries around how supermassive black holes grow, sustain entire galaxies, and create environments where the laws of physics are pushed to the limit.
We’ve spent the past few weeks burning copious amounts of AWS compute time trying to invent an algorithm to parse Ars’ front-page story headlines to predict which ones will win an A/B test—and we learned a lot. One of the lessons is that we—and by “we,” I mainly mean “me,” since this odyssey was more or less my idea—should probably have picked a less, shall we say, ambitious project for our initial outing into the machine-learning wilderness. Now, a little older and a little wiser, it’s time to reflect on the project and discuss what went right, what went somewhat less than right, and how we’d do this differently next time.
Our readers had tons of incredibly useful comments, too, especially as we got into the meaty part of the project—comments that we’d love to get into as we discuss the way things shook out. The vagaries of the edit cycle meant that the stories were being posted quite a bit after they were written, so we didn’t have a chance to incorporate a lot of reader feedback as we went, but it’s pretty clear that Ars has some top-shelf AI/ML experts reading our stories (and probably groaning out loud every time we went down a bit of a blind alley). This is a great opportunity for you to jump into the conversation and help us understand how we can improve for next time—or, even better, to help us pick smarter projects if we do an experiment like this again!
Our chat kicks off today, July 28, at 1:00 pm Eastern Time (that’s 10:00 am Pacific Time and 17:00 UTC). Our three-person panel will consist of Ars Infosec Editor Emeritus Sean Gallagher and me, along with Amazon Senior Principal Technical Evangelist (and AWS expert) Julien Simon. If you’d like to register so that you can ask questions, use this link here; if you just want to watch, the discussion will be streamed on the Ars Twitter account and archived as an embedded video on this story’s page. Register and join in or check back here after the event to watch!