A Big Bridge: High Performance Computing and the Humanities
This is the abstract of a paper to be presented at the 2008 Congress of the Humanities and Social Sciences
Authors: Geoffrey Rockwell - Associate Professor, Department of Communication Studies and Multimedia, McMaster University
Hugh Couchman - Professor, Department of Physics and Astronomy and Scientific Director of SHARCNET, McMaster University
How can humanities scholars use high performance computing in their research? SHARCNET, an HPC consortium in Southern Ontario, and TAPoR have been developing bridges between computing humanists and HPC. We propose a talk introducing Hugh Couchman, the director of SHARCNET, and Geoffrey Rockwell, the project leader of TAPoR. Couchman and Rockwell will, in a dialogue format, address questions about what HPC is an how Canadian HPC facilities are adapting to support humanities computing.
The challenges facing humanists using HPC facilities are many:
1 - Few humanists know what HPC is or how it is different from personal computing.
We will briefly review the history of supercomputing or, as it is known now HPC, in Canada. We will give examples of facilities across Canada. We will talk about services like Access Grid Conferencing that are offered by HPC facilities that may be of use to humanists. We will provide some concrete use cases where SHARCNET has supported humanities research.
2 - There are few examples of humanities research benefiting from HPC.
We will survey a range of international projects that have used HPC facilities as a way to describing the types of problems that HPC can help with. In particular we will talk about the processing of large data sets and visualization projects. We will show a text visualization developed at SHARCNET of Mary Shelley's _Frankenstein, or, The Modern Prometheus_. The visualization project called the Big See will be presented to illustrate a successful collaboration.
3 - Few humanists are connected to the networks of researchers that typically administer HPC facilities.
We will provide information about Compute Canada, the nation-wide organization supported by CFI, of which SHARCNET is a member. We will discuss how digital humanists can get in touch with their regional consortium. Compute Canada is built on the C3.ca Association and has as its goal to "unite the academic high-performance computing (HPC) organizations in Canada." SHARCNET is one of the founding 7 regional consortia. 
4 - What can HPC support for the humanities look like in Canada?
SHARCNET is sponsoring a two day workshop in April about HPC and the Humanities to outline an agenda for collaboration. The workshop will bring HPC researchers and staff from SHARCNET together with computing humanists at SHARCNET institutions to develop real support collaborations. This presentation will share the results of that workshop back to larger community. We expect it to take the form of concrete support mechanisms that involve digital humanists. There is, however, a parallel challenge that digital humanists need to take seriously - that of engaging HPC facilities and helping colleagues use the adapted facilities. If HPC facilities like SHARCNET reach out to humanists, SDH members, should be the first to reach back. SDH, and other societies concerned with science and technology, should lead in imagining and experimenting with the use of HPC, and we should do so in a manner that explains what we are doing to colleagues. HPC facilities have received significant funding and are genuinely engaged is outreach. Digital humanists can help in that outreach to ensure that these facilities are well used by a breadth of researchers.
5 - The culture of HPC and the expectations as to the computer literacy of researchers sets too high a bar for entry.
This is a real problem and we propose to use this panel to suggest that digital humanists can serve as initial users and guides to HPC facilities. We expect it will be organizations like the SDH that will bridge the cultures, which is why we propose to bring this dialogue to SDH. We also acknowledge that both sides have to learn about and adapt to the other. As an SHARCNET/TAPoR experiment showed, HPC facilities are optimized for:
5.1 Queuing batch-oriented processes that need multiple-processors. HPC users tend to view their work in terms of big questions that are prepared, queued and processed using the performance of the facility. Humanists by contrast tend to conduct iterative hermeneutical research that could be characterized by sequences of smaller questions. Queuing smaller questions and waiting for results disrupts the interpretative process. Digital humanists are used to resources like Google that give almost immediate feedback. There are, however, some interesting exceptions like large-scale record processing which we will survey in this dialogue.
5.2 Command-line operations that run with little software overhead so HPC processes can fully utilize the facility. Web research resources need to be "always on" and therefore consume valuable computing HPC overhead. Again, there are interesting exceptions like visualization which we will discuss.
5.3 Computation intensive processes that can use parallel processing. Humanists typically have meagre and amateur programming support so they are usually unable to take advantage of parallel processing even if useful. One of the major outcomes of the April workshop will be an outline of how HPC facilities can be configured to support modest research projects typical of the humanities.
6 - But are there real needs for HPC in the humanities?
What if we built bridges to HPC and no one crossed them? What if there was no need in the humanities? Our dialogue starts with the premise that important problems in the humanities will soon need HPC. We live in an age of too much information. In a study titled, "How Much Information?" researchers estimated that 5 exabytes of new information are being generated each year.  The Wayback Machine (Internet Archive) claims to have 85 billion web pages archived for browsing.  Google can be argued to be running one of the largest supercomputers, and one used for search and concording of the web - a need humanists already avail themselves of. All researchers are dealing with an excess of publications and information in textual form. We believe this is a problem digital humanists and HPC specialists can work together on. We believe that together we can develop new tools for analyzing large-scale collections that are too large to read in traditional ways. Researchers that work with textual evidence from literary studies to law need HPC to help create custom aggregations for focused research that can be combined with the analytical tools that can handle large-scale text datamining. Access to large-scale textual resources is a "big problem" in the humanities that we can collaboratively start solving with existing HPC facilities if we are willing to engage in negotiation and dialogue.
In conclusion, this proposed panel will present a dialogue on the challenges of collaboration between the humanities and HPC facilities in Canada. We do not propose to simply describe a problem, we propose to discuss a concrete collaboration worked out between SHARCNET and digital humanists that may serve as a guide for other regional collaborations.
1. For more on Compute Canada see <http://www.computecanada.org/>. Their Long Range Plan <http://computecanada.org/__groups__/local.nic/LRP.pdf> includes a digital humanities scenario on page 84, "Real or Fake? Dynamically Validating the Reliability and Authenticity of Digital Records".
2. "How Much Information?" <http://www.sims.berkeley.edu:8000/research/ projects/how-much-info-2003/>
3. Internet Archive <http://www.archive.org/> 85 billion was the number as of December 4th, 2007.