I think it’s sometimes useful to write about projects (or just things) that are currently ongoing. It’s nice to get feedback about what you’re doing, and also, there’s no better way of learning something than by explaining it to others (people call this the Feynman method of learning, after Richard Feynman, but I think people have been doing it for a very long time.) In any case, here’s a brief rundown of the projects I’m currently working on.
This project falls under the “Watzek DI” umbrella. Throughout my other recent work with Jeremy and Jens, Jupyter has been somewhat of a constant. Jupyter is a component of our BLT architecture and I view it as one of the most important new ways to interact with scientific computers; as a “gateway,” I’ve found it makes even people unfamiliar with computing on that scale feel at home. Our paper “Jupyter Notebooks and User-Friendly HPC Access”(obviously) made heavy use of Jupyter notebooks. In December, we had a guest on campus, Doug Blank from Bryn Mawr College, and among his specialties is the use of Jupyter. We’ve noticed that there is some energy on campus around Jupyter, from people in Econ and Political Science looking to find a nice, friendly interface for people to access our HPC resources all the way through to physicists using the entire output of out BLT system all at once, through Jupyter. One thing we haven’t been able to do as well is provide people with a nice sandbox they can interact with our Jupyter servers though without any complex VPN connections. However, now, LC-affiliated people can access a free Jupyter and RStudio installation at https://datasci.watzek.cloud. We’re looking to continue developing this as a resource for the school, so if there are any more features you would be interested in, please let me know.
2. Task-Aware HPC Scheduler
My next large research project (which I have finally begun work on) is one for which I’ve had an idea for quite a long time. A few years ago, I came up with the concept of a scheduling system for an HPC computer which would have a deeper understanding of what kinds of tasks are running on it. That was (and still is) an incredibly vague goal, but I thought it would be interesting and potentially have an impact on the way people design schedulers. This project has gone through many iterations already, simply because it needs to be enough of a well-formed challenge to get started with. Now, I’m going to back up and try and explain the problem.
Imagine you are operating a computer cluster. It has no scheduling system. You’re a sad sysadmin who spends all your waking hours telling your users which cores of the computer cluster they can use at what times. They have different CPU allotments, different needs, and there are infinite more tasks waiting in line after them. How do you decide who gets to use what parts of the machine? This is obviously a very difficult thing for a computer to do, and what generally happens now is essentially a combination of 1. certain users have priority over others, and 2. whoever has been waiting in line the longest gets to go next. This considers nothing about the tasks themselves.
An interesting approach to this problem (and one that I am attempting to make progress towards) is to attempt to learn about how the task is likely going to behave, using that information to plan out what the best way for all the users to share the computational resources which are available. The specific ways that I am attempting to do this are to guess how a task will be constrained (for example, certain tasks are memory constrained while others are processor constrained), and attempt to run tasks which are differently constrained together. For example, if a task is memory constrained, it might be a good candidate for being run with a CPU constrained one, as the CPU one doesn’t need much memory and the memory one doesn’t need much CPU.
3. Increasing BLT Visibility/Broadening HPC on LC Campus
We in the DI/LC HPC initiative are very interested in getting “non-traditional” HPC users involved with HPC on campus. We have a large number of HPC users who are biologists, physicists, mathematicians and computer scientists. This is essentially expected, as these are the kinds of people who have ready made tasks which can be slotted in on the HPC system. What we’re interested in doing now that we have a reasonable number of users is attempting to broaden our horizons and get people who don’t fit the “HPC User” archetype on board. Some examples of this are political scientists, economists, and other social scientists. There are many types of people who may have an HPC use case, like massive economic or political datasets. We would like to get these people involved with the HPC initiative to a) hopefully make their lives easier, and b) so that we can build a real HPC community on campus and share our stories about how we’re using HPC (which is becoming an ever-more important domain) to solve problems in all kinds of fields rather than just the ones traditionally associated with HPC. As part of this initiatives, we are trying to “Collect” examples of interesting HPC uses. Some of these include the LIGO gravitational wave code, datasets about rents in the San Francisco Bay Area, topic modeling, and procedural generation of text that “sounds like” certain authors wrote it.
Please let me know if you have interesting ideas for how HPC can be used (or demonstrated) in a fun way which would be interesting to humanities-type people, artists, musicians, social scientists, and anyone else who is interested in a) computers and b) something not traditionally used in HPC.
5. Getting Good at Basketball
This last one is sort of a joke, but sort of not. My friends and I have started a basketball team called the Supreme Court and I would like to actually be at least a serviceable member of the team. I’ve been playing basketball every day since the semester started and honestly I already feel physically much healthier. I can run for much longer without feeling tired and I feel like I have a lot more energy. Plus, basketball is just fun to play.