August 04, 2008
When he takes the stage for his keynote speech at the Next Generation Data Center conference, Kevin Clark, director of IT operations for Lucasfilm, says he just wants to share what goes on behind the scenes at the entertainment company that has played key roles in everything from "Star Wars" to "Pirates of the Caribbean." From film-rendering systems to storage challenges to taking advantage of virtualization technologies, Clark will share a little of the pain that comes with managing a demanding IT environment, but will also share a little bit of the fun in the form the seeing the end products of this hard work.
GRIDtoday spoke with Clark to get the lowdown on just these topics. In this interview, Clark tells just how critical grid technologies and high-performance networking are to Lucasfilm's movie-production work, as well as how everyday tasks like testing new services have been made far more efficient through virtualization. He also discusses future plans around power conservation, and the uncertainty of coping with ever-increasing production-side demands.
GRIDtoday: What technologies drive the production process at Lucasfilm, especially in regard to grid computing, high-performance computing, etc.?
KEVIN CLARK: There is a number of divisions that live under Lucasfilm. Most everyone is aware of them, but maybe not the breadth of their exposure: Industrial Light and Magic (ILM), obviously, is doing special effects; LucasArts is driving our games development; and Lucasfilm Animation is a fairly new entity that's doing episodic TV and animated film work, and we've got a film called "Clone Wars" coming out in a couple of weeks.
There are some common trends, especially within the special effects and animation divisions, which drive our backend. We're based, especially at ILM, on a Linux-based platform, so we use a lot of proprietary open source applications for content creation. When it comes down to the rendering process, putting all those bits together, we've got a 4,500-processor render farm in the datacenter that's hosted here, and we use an extended or distributed rendering model, so we're also using workstations during off hours. We've got a proprietary scheduler that we use, which always is one of the first questions that comes up because everyone wants to be able to replicate that. We expand to about 5,500 available processors during off hours.
In terms of the technology itself, we are an AMD shop. We run within our render nodes, the blade racks, dual-core dual Opteron chips with 32GB of memory on board, but we're expanding those to quad-core as we speak. We've got a fairly solid plan in place now that AMD is caught up with its timeline in terms of our expansion capabilities.
We've got just about 400TB of storage online for production, and that also is a challenging environment to manage because of the rate at which data is changing. Every night, we're writing out 10-20TB of new data on a render -- the shots are pretty sizeable -- and a project will use up to a hundred-plus terabytes of storage. When you're doing incremental backups that change up to 50 percent over a week, that's a pretty significant amount of change. We've got other challenges in the backup environment, as well.
For our OS, we're running on a SUSE platform, but we're looking to change that in the near future. We do a lot of open source work using our own code base, and it's always a little bit of a moving target because the production requirements change and then we have to change to accommodate the production requirements.
Gt: You mentioned you're using a proprietary scheduler. Are you using any off-the shelf components for the storage or database tiers?
CLARK: For storage, we're a NetApp shop, we're using the NetApp [ONTAP] GX line. We're doing that for a number of reasons. Certainly, having a global namespace in the virtual file system technology works well for us.
In terms of what we're doing on the rendering side, we've traditionally worked with tier two vendors, and we've got a long line to which we've gone to build the spec. I'm actually hoping to change that model; the tier ones have never been able to play in this space. Currently, we're working with Verari on that front.
This all runs over our network, which is primarily a Foundry [Networks] architecture shop. When we first built up this facility, I think we were one of the larger 10-GbE-backbone facilities on the West coast. We've got 350-plus 10 GbE ports that we use for distribution throughout the facility and the backend. We move a lot of traffic.
Gt: What aspects are most important in the rendering process? Processing? Storage? Network? Does one aspect take precedent?
CLARK: The key resources that we depend on are processors and memory. Bandwidth also is always an issue ... we overbuilt this network when we moved in and we still haven't reached capacity. This is out of necessity -- you don't want to cut that too close to the edge. In terms of storage, our challenges are always just managing the data pools so we don't overheat a specific filer head. Once you reach those I/O constraints it creates a bottleneck if it's trying reach the datasets through one filer head. [Why send it through one] when you can distribute it over 24? It's a moving target in terms of moving the data in the backend so we're distributing that load effectively.
Gt: For how long has Lucasfilm been utilizing grid computing?
CLARK: I've been with the company four and a half years, and we've been operating in this mode for at least that long.
One interesting story when we moved in [to the new facility]: We actually had a distributed grid where we were connected back up to our home office in San Rafael over 10-Gig dark fiber connection, and we were able to co-render and co-storage between the two facilities, as well. We had artists who were based up in San Rafael doing their work and scheduling their renders, and they would see no difference in performance in terms of where they went to look for their data and their shots -- which was pretty impressive at the time. It bailed us out of having to shut the company down for two weeks in the middle of working on "Star Wars: Episode III."
We try to simplify it as much as possible because of the loads we throw at it, and it's been very healthy for us and very stable.
Gt: Have you undertaken any different or advanced uses for the technology in your time there?
CLARK: We've certainly enabled the companies to do more work. We've seen the workloads increase significantly based on the complexity of the work that we do. Look at the movies that we're done five or six years ago with water work -- "The Perfect Storm" would be an example, where you had that big wave -- and compare them with current-day movies, whether it's "Pirates of the Caribbean" with all the water work or "Poseidon," where there's a movie about a ship in the ocean and there was physically no ship or no ocean -- that was all digitally done.
I think the biggest change has probably just been in the amount of resources required, but that's putting an increasing load specifically on the filers and their ability to write and read data out fast enough to meet the demand. We're giving workstations out to artists now -- HP 9400 workstations with dual-core dual Opteron processors and 16GB of memory -- so we're putting servers out on the floor now, effectively. It's sort of an escalating model, and our challenge now is how we can better segment storage so that we don't continue to sink costs into high-cost disks. There are always challenges in the pipeline to make sure the code base is best utilizing your resources, as well.
Gt: From an overall business perspective, how important are technologies like grid computing and high-performance storage to Lucasfilm?
CLARK: Critical. This company is based on creative advancement, and that drives the technology advancement. But without the tech piece, we wouldn't be enabling the artists to do what they do today. Back in the day when "Star Wars" was first done, a lot of it was based on models. The Industrial Light and Magic model shop was world renowned for the work that they did, working with miniatures and such. We rarely do that type of work anymore. In fact, we've sold off that model shop; it's an independent entity. We do some work with it, but the vast majority is now CG-driven.
From the technology side, if the backend systems aren't working, this facility truly does stop.
Gt: You mentioned that you would do some things differently with the new facility. Can you elaborate on that a little?
CLARK: When we moved here, we brought a number of the divisions together. They used to be distributed through Marin County. We brought Lucasfilm, LucasArts and ILM, among others, down into the one building. Our datacenter resources were consolidated -- we moved from smaller datacenters into a 10,000-square-foot datacenter -- which puts certain challenges on my team. But it also really enables us to be more efficient in terms of how we're serving services out to the divisions. We've got common authentication services; common storage devices so we can share data across the divisions as needed, so they can be more effective in how they collaborate; and, at times, when we are resource-constrained -- let's say we're trying to get a shot done for Industrial Light and Magic -- we will talk to LucasArts about how we can leverage its workstations off hours to help render some of these shots, and boot them from a Windows environment into a Linux environment so they can be part of the render pool. I think the economies of scale certainly benefit us in terms of having more resources in a central location.
Gt: Have you considered accessing computing resources externally, or creating an internal pool or cloud of resources?
CLARK: We've always been fairly internal-facing in terms of our resources. We're in a very cyclical industry -- for the summer release cycle, we do all our work in the wintertime and the spring -- and sometimes when things change on a production we get additional work. If we're really stretched at the resource level, we need to get creative at times. To date, we haven't worked with anything externally, although that's always an option.
Gt: What about internally? Have you looked at internal cloud or utility to facilitate resource sharing among the divisions?
CLARK: We have, but we're not there yet in terms of being able to benefit from that at the organizational level. It's something that we go through and we respond as needed. When that becomes something that's truly viable and we can make a case for it and everyone will benefit, we'll pay more attention to it. We really haven't given it a lot of thought.
Gt: What about from a day-to-day perspective? What non-production technologies -- virtualization, for example -- is Lucasfilm utilizing?
CLARK: We're doing more and more with virtualization. We've been fairly specific about which services we wanted to work to virtualize. A good example is where we've got disparate databases, let's say in FileMaker -- at one point when I first came here, we'd have 10 FileMaker servers throughout the enterprise -- and we figure out how to consolidate those down into two or three. We're using VMware to host a lot of development environments. So, instead of taking three weeks when someone calls up and says, "Hey, I've got a new that I want to test. I need to get out there," we can turn that around in half an hour in a virtual server environment. We're doing more and more on the production front there, as well. Where it makes sense, we're absolutely working to limit the physical resources and virtualize them. To date, we've been really successful at that, so I think in the next year is when we can really look at that next level of services where we've had less of a comfort zone in the past because of the uncertainty and pull them in. We've gone probably through the same challenges everyone else has in terms of how you secure that data, how you back it up and manage it.
On the networking front, we've worked to consolidate all of our networks. We're providing PoE out from the switch to all of our Web farms now, which has worked fairly well for us.
The big piece that we've really benefited from was database consolidation on the virtual front, as well as the development of services. It's come out of nowhere. Every IT shop deals with them, and they would benefit just by putting a small cluster, or even a server, out there with VMware on the development side rather than having to spin a new piece of hardware for every single request.
Gt: What about other "next-generation" technologies? Have you looked at automation, advanced virtualization management, or anything like that?
CLARK: We haven't focused there as much. Quite honestly, for us, I see a big push next year on more of the facilities side, on how we can be more efficient and the environmental piece of the datacenter. How can we be more effective in airflow management or power utilization? There's a lot of fairly simple steps you can take to get there. Especially with our render nodes -- we're pulling quite a bit of power on those, which produces quite a bit of heat.
I think one part of our focus will really be on the environmental side, and also continued integration across platforms. A big challenge for us is how we bring our Linux environment more in line with the corporate environment -- authentication via Active Directory, shared calendaring, common work tools. That's really where we're hearing more noise right now and there's a bigger issue for us, so that's probably where we're going to focus more, on the productivity side.
Gt: Are your efficiency goals driven by eco-consciousness, power constraints, or some combination of factors?
CLARK: It's a combination. In some ways, it's a pet project for me, in other ways, there's a real business case. As a shared service group, I allocate costs back out to all the divisions. I get to sit down with the head of the division and explain why their power bill is so high, and that can difficult to explain sometimes. So, we're really trying to do the right thing for the business in reducing those costs, but also just out of consciousness and working to do the right thing. We're not constrained in any way, currently, by power and cooling, but if you at the curve in terms of our adoption of resources over time, that could be a completely different story in five years. We're just trying to stay ahead of that curve.
Gt: What trends and technologies do you see becoming more important to Lucasfilm, and movies studios, in general, in the years to come? How will production look 10 years from now compared to today?
CLARK: That's a good question. That's an industry that hasn't necessarily been quick to change, but George [Lucas], years ago, was really the first one to get out and advocate the use of digital production techniques. Everyone used to shoot on film, and you see more and more production shooting digitally now. That's essentially cleaned out part of the pipeline where you're dealing with one less set of media. Certainly, digital distribution out to theaters is something that's been talked about for years. I think there are many benefits once you address just the physical security portion of that. In terms of what the consumers will see, there's also going to be a continued push of prevalence into the home. That's always been a target of the studios, as well as, I think, Microsoft and Sony if you look at their gaming strategies.
From my perspective on how things will change, that's a good question because we're getting close to a point of needing more resources to continue to do the work. We work under tighter constraints and we have shorter schedules to get very complex work done, which puts increasing stress on both the systems and individuals who are running them. I'm not sure how that is going to change, but there are all kinds of things I can't really talk about -- advances on the creative side -- that are going to continue to push the media of film, whether it's motion-capture technologies, digital doubles or things like that. There's always been talk that actors can be anybody you want them to be based on what bit sets you put on the backend and not so much the person.
We're dealing with, basically, providing a Formula 1 racecar to the artists and trusting them to drive it responsibly. We keep putting in bigger engines and making them go faster and faster, and at some point something has to change on how we drive that car. That's really what our challenge is today.
Gt: Is there anything else you want to add?
CLARK: I recognize that a lot of what we do is different, but a lot of the challenges we face are similar to just about everybody else out there. Maybe we've got more resources in some sense ... but it's pretty fascinating when you get behind the scenes and really see how the image from "Iron Man" or "Indiana Jones" that looks real but isn't real gets up on the screen. I'm hoping to at least give a little visibility into that and how we manage that today.
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/02/2012 | AMD | Developers today are just beginning to explore the potential of heterogeneous computing, but the potential for this new paradigm is huge. This brief article reviews how the technology might impact a range of application development areas, including client experiences and cloud-based data management. As platforms like OpenCL continue to evolve, the benefits of heterogeneous computing will become even more accessible. Use this quick article to jump-start your own thinking on heterogeneous computing.