Project Summary

Science Studio: a Computer Network and Software for the Collection and Management of Scientific Data from Remote Sites

by
N.Sherry, J. Qin, M. Suominen Fuller, Y. Xie, O. Mola, M. Bauer and N.S.McIntyre
Faculty of Science, The University of Western Ontario, London, ON Canada, N6A 5B7
and
D. Maxwell, D. Liu and E. Matias
Canadian Light Source, University of Saskatchewan, Saskatoon SK S4P 4E4
and
C. Armstrong
IBM Canada

Abstract

Science Studio is a multi-partner software project that has developed web-services software that provides teams of scientists with secure links to experiments at one or more advanced research facilities. The software provides a widely distributed team with a set of controls and screens to operate,observe and record essential parts of the experiment. As well, Science Studio provides high speed network access to computing resources to process the large data sets that are often involved in complex experiments. The simple web browser and the rapid transfer of experimental data to a processing site allow efficient use of the facility and facilitates decision making during the acquisition of the experimental results. The software is intuitive, providing users with a comprehensive overview and record of all parts of the experimental process. A prototype network is described where one site is a beamline at the Canadian Light Source that makes measurements involving microscopic X Ray Fluorescence Spectroscopy and Laue X Ray Diffraction. A second synchrotron site is at the Advanced Light Source where Laue diffraction measurements are also taken. A third site is at a nanofabrication laboratory at the University of Western Ontario. The very large data sets produced by X Ray Diffraction provide a particular challenge to rapid assessment of the data. An on-line parallel processing facility has been developed that processes this data using InfoSphere Streams, a stream processing capability developed by IBM. As well, the processed data is viewable in near-real time by the scientific team. Science Studio should therefore improve the accessibility to large science resources such as synchrotrons to more users than has been the norm for the past.

Keywords; broadband network, remote access, stream computing, XRF, XRD, Laue diffraction, synchrotron, electron microscopy.

Introduction

Science Studio is web services software package that links experiments underway at major science centers with collaborating team members across the world using the simplest of web browsers. As well, complex information from these experiments can be moved in real time to major computational centers for processing and viewing using a high speed network. At any time individual scientists can access their experimental records at all centers (sites) as well as transfer or process their data. A prototype version of Science Studio has been created involving experiments at two beamlines in different synchrotrons and an ion beam/ scanning microscopy facility in a Canadian university. The software is open source. Most of it is available for download from the Science Studio website: http\\:www.sciencestudioproject.com and a more detailed discussion of its operation is available at that address.

The synchrotron-based experiments linked to the users by Science Studio involve the observation, measurement and control of microscopic X ray fluorescence (XRF) spectroscopy and Laue X ray Diffraction (XRD) experiments. In the case of the ion beam experiment, the remote user is able to observe the Scanning Electron Microscope (SEM) images and acquire Energy Dispersive X ray (EDX) spectra. This paper first describes the appearance of the software to a typical user, then provides details on the architecture of individual components, as well as the network messaging software that integrates individual Science Studio sites. The resultant software provides a modern scientific team, that has experiments underway at several facilities, with a secure and efficient means of monitoring data as it is acquired, as well a means of rapidly moving that data via a lightpath to a cloud-like site where it can be efficiently processed by advanced computational techniques.

The prototype sites chosen for this study have involved synchrotrons because many of the experiments produced at such facilities produce large quantities of data that normally entail weeks or months of effort to produce a finished publishable result. Frequently, no analysis of the data is possible during the experiment itself and later analyses can be delayed by the lack of access to advanced parallel computation facilities. Science Studio as a concept is thus an attractive means for data management that not only provides group access to an experiment and its later processing, but also provides a secure and accountable record of all stages of the movement of the data to finished product.

(a) Remote Experimentation
Remote Acquisition of X Ray Fluorescence Data

The VESPERS beamline at the Canadian Light Source([1,2] provides the user with capabilities to acquire XRF and XRD maps of materials using a polychromatic microfocused X ray beam. XRF maps are acquired by rastering the sample in this beam using a mechanical stage; the fluorescent X rays are captured by a silicon drift diode (SDD) detector at each point in the raster. The energy dispersed XRF spectrum from the SDD detector is analyzed to provide a measurement of the elemental composition at each such point. Thus, spatial distributions of chemical elements within the material can be provided to the user. The individual spectra comprise 5-10 KB and XRF maps may consist of 50,000 or more spectral points.

Science Studio software was developed to allow a user (and team) to control many aspects of the XRF experiment either while physically present at the VESPERS beamline or from a simple web browser located anywhere in cyberspace. As the XRF map is being collected, the completed portion of the map can be imported to the users’ desktop or elsewhere and analyzed using software (see below) that resides on their desktop. Thus, the user team will have a rapid assessment of the worth of the particular experiment and this can allow the experiment to proceed or to be terminated. The experimental data can be imported during or after the scan by as many individual users as are participating in the session and control of the experiment can be passed among qualified users. Thus, the lag in understanding of the experiment is reduced and the efficiency of the limited synchrotron time available is increased.

Figure 1(a-d) show three levels of experimental screens available to the users. In 1(a) the information concerning the VESPERS beamline is displayed, including the availability of the photon beam, the beam lifetime and its intensity and sample geometry. In Figure 1(b) the screen is shown that allows the users to choose the exact dimensions of an XRF map, the size of the step in the raster scan along with the dwell time. The image of the sample is constantly refreshed and the progress of the beam within the scan region is marked by a moving spot. The screen in Fiigure 1(c) shows the XRF spectrum for each point in the scan as it is acquired.. Parameters such as the position, XRF count rate, % dead time and incident beam count rate are displayed and stored, along with each spectral scan. The screen is also used to acquire test spectra prior to a full raster scan. At screen left the “tree” displays all raster scans acquired during the session; these scans can be organized (and reorganized) according to the participating experimental team, the sample, on which the scan was made or the particular experimental session. Grouping of data under each of these categories can be important in recognizing commonalities in data characteristics. Figure 1(d) shows an alternative screen available for this level where an optical image of the sample and the scan region are displayed.


Fig 1(a) Screenshot of the first VESPERS beamline page showing, from left to right, (i) the tree of data sources accessible to a particular user, (ii) the status of the photon beam condition as it enters the end station, (iii) a design diagram of all major components of the end station.


Fig 1(b) Screenshot of the second level XRF page for the VESPERS beamline, showing, from left to right, information on the scan progress and calibration, optical image of the sample showing the region position of the scan and the most recent point, scan stepping parameters.


Fig 1(c)Screenshot of the third level XRF page for the VESPERS beamline, showing the directory tree, the XRF spectral information about one point, and the XRF spectrum.


Fig 1(d) Screenshot of the third level XRF page for the VESPERS beamline, showing the directory tree, the XRF spectrum for an individual point and an image of sample showing the exact point from which the sample was taken.


Users gain access to their tree through an authentication system that verifies their permission against records at each site where the user group has acquired data. To access Science Studio in order to conduct a new remote experiment the user authentication is verified against a schedule prepared by the User Office at the particular site.

Video and audio communication between users and the beamline scientist is facilitated by the availability of Skype and Googletalk services on the page. Moreover a social network page is available that allows members of the user team to share results with a wider circle of colleagues. Analysis of XRF data is discussed later.

Remote Acquisition of X- Ray Diffraction Data

The other capability provided by the VESPERS beamline at CLS is polychromatic XRay Miicroscopy (PXM). Laue diffraction is produced by polychromatic X Rays back reflected from a sample. The wavelengths used and the highly coherent nature of the radiation allow the detection of minute changes to the spacing of interatomic layers in the sample. The energy of the X Rays used maximizes the detection of reflections from the top 50 microns of a sample. The diffraction spot pattern produced comprises reflections from many different planes in the sample crystal structure. Their placement will be characteristic of the type of unit cell, the interatomic spacing and the plane involved in the reflection. PXM is most profitably used to compare a result for an unknown sample with than for a perfect crystal of the same structure. Thus, small changes to the atom placement due to differing types of mechanical strains can be measured [3]. PXM uses a micron size beam of polychromatic X-Rays to produce a Laue diffraction pattern for each micron area irradiated by the beam. Unlike Bragg diffraction it is unnecessary to rotate the sample. The X Ray diffraction pattern for each micron area can be analyzed by software to produce a map showing different types of strain present in the sample, thus providing information on the primary residual elastic strain directions. As well, the local mis-orientation of the sample crystal can provide information on plastic strain. Direction and shapes of the diffraction spots can also be analyzed to indicate the extent and direction of slip systems, dislocation density and the presence of dislocation walls.

The X ray beam and geometry used for PXM are the same as for XRF, the diffraction pattern produced from the interaction of beam and sample is collected on a CCD detector located a few cm above the sample (see Fig 1(a)). Just as in the case of an XRF map, an XRD map is collected by mechanically rastering the sample in a pattern. However, the difference is that the CCD detector collects an image of 4-8 MB at each point, thus providing a much larger data set to analyze. Further, the analysis process itself is vastly more complex, particularly if accurate strain information is to be obtained.

Laue XRD patterns are able to collected remotely using Science Studio software. Users gain access to the VESPERS beamline using the same pages as for XRF. The XRD tab is selected and a screen similar to that in Figure 1(b) assists the users to set the region of the raster pattern, as well as step and dwell times. Detailed setup and monitoring of the XRD pattern collection conditions is done on the screen shown in Figure 2. On the left the “binning” or number of pixels to be used is selected, while on the right the exposure time is selected and the number of exposures taken is monitored. The diffraction images are displayed as they are collected on the screen in the center. The users may choose to send the XRD data immediately for analysis at the computation site or have it held for transmission at a later time.


Fig 2. Screenshot of the third level XRD page for the VESPERS beamline. Level one and two screens are common to XRF and XRD functions. From left to right are shown a panel indicating the CCD settings used for the particular CCD image, the image being acquired and information concerning the entire sequence of images being collected to form an XRD map.


Laue XRD patterns are also collected for analysis from a site at Beamline 12.3.2. at the Advanced Light Source (ALS). However, in this case, no controls of the beamline are available to remote users, but data may be transferred to the processing site either during acquisition or at the completion of a run.

Remote Acquisition of SEM and EDX Data

The Nanofabrication Laboratory at the University of Western Ontario has an advanced ion beam facility for preparation of nano-scale materials. A Science Studio site has been established to allow a remote user to acquire and save images from a LEO Scanning Electron Microscope (SEM) with the assistance of an operator. As well, the remote users may select regions for analysis by Energy Dispersive X ray (EDX) Analysis that operates using LINK software. The remote Science Studio user is able to work through this latter software to acquire and save EDX spectra from regions chosen by the user. Figure 3 shows a screenshot of a remote session at the nanofabrication laboratory with both SEM and EDX data displayed.


Fig 3. Screenshot of the management page for acquisition of remote SEM and EDX data from the Nanofabrication laboratory facility at he University of Western Ontario.


(b) Transfer and Processing of Data
XRF Data Processing

A copy of all data acquired at a site belonging to Science Studio is retained at that site for a limited time. Users may import any of this data to their desktop via the internet at any time for viewing or analysis (see below).

In some cases, the volume of some data is of a size where analysis is manageable on an individual PC. Analysis of XRF spectral scans is readily done on the users’ desktop using software called “Peakaboo”. The software allows users to identify the spectral origins of the XRF spectrum using a routine that fits all components of the K or L spectrum including escape and pileup peaks and then plots their spatial intensity distributions as maps. The software provides a number of advanced mathematical filters that are used in noise reduction or in background attenuation or removal [1]. Parabolic or Brukner filters are particularly good for background removal, while a “spring” filter is found to be most efficient for noise removal. It takes 1 minute for the Java-based software to apply a spring filter to 5000 spectra acquired for an elemental distribution map, using an average dual core laptop. The line positions and relative intensities for each line series were taken from several sources.. Fits take account of details such as separations and relative intensities of all Kα, Kβ,, Lα, Lβ and Lγ lines. For most of the K series elements from Ca to Mo, the relative intensities of alpha and beta composite lines have been checked and adjusted for our lookup table using metals or compounds. For fitting of the spectral peaks, a Gauss-Lorenzian composite lineshape was used with preset widths. Thus, the identification of a particular element requires a close fit of multiple lines in the spectrum, each with its own shape. When the identification of a particular element is to be tested in the presence of overlapping lines, the unknown spectrum is fitted to whatever peak intensity that is not already fitted by the lines from other elements. The fitting algorithm developed does not allow all sets of elemental peaks to be freely variable to fill the available peak intensity envelope; rather, a criterion for introduction of a set of peaks for a new element is that there must be a discernible inflection in the peak shape. Several fitting sequences need to be tested to ensure that the solution is consistent. Peakaboo software normalizes intensities of X Rray spectra with respect to that of the incoming beam as well as the intensity of the argon K line produced by the surrounding air. A user guide to Peakaboo is available on the Science Studio website.

The work flow on Peakaboo revolves around a single screen where choices are made concerning filters and elemental spectra to be chosen. A screenshot, seen in Figure 4, shows a drop-down menu for spectral display options. To assist in identifying and mapping all elements in a scanned area, an average of all spectra can be displayed or one that searches for the strongest peaks in all spectra in a map. The software is particularly easy for novices to XRF measurements. Peakaboo can be used with other XRF sources; the normal output format used in Science Studio XRF experiments is Common Data Format (CDF)-one of the formats being proposed for general use within synchrotrons. However, Peakaboo can also be used with data formats from other XRF sources: for this, data can be read in ASCII format.


Fig 4. Screenshot of a page from the Peakaboo XRF spectral analysis software showing a “pulldown.table”. Choices for representation of the composite XRF spectrum for all areas of a map are presented.


XRD Data Processing

It is becoming less practical to use a PC to carry out extended analyses of mapping data produced from Laue microscopic XRD studies.Thus, analysis of larger XRD maps pattern is now being done at the “Process XRD” site at the University of Western Ontario. The XRD data can be transferred from the source site as it is being acquired or it can be transferred as one file after acquisition is complete. The screen used to set up the former type of transfer is shown in Figure 5. The transfer by lightpath is usually faster than the computation itself. Computation results appear in the “Data Import Session” file that is created with the transfer. The computation service itself functions as a cloud but does not retain the original data, except for cases where highly repetitive reprocessing is required. Process XRD can process several sets from multiple sites simultaneously; preference is given to data that requires processing in near-real time.


Fig 5. The screen used to set up the former type of transfer.


The software at Process XRD, called “FOXMAS (Fast On-line X Ray Analysis Software) first locates the approximate position of the most intense spots in each the CCD image. Then, the geometric center of the spot is determined as well as its shape: the shape of the spot provides information on dislocations present. Finally the spot centers are indexed according to crystallographic data that is provided by the users. The most common use of Laue XRD is to measure the deviations of the crystal structures compared to a model structure that is free from strain distortions. Thus, the deviations in crystal shape between an unknown sample and the model can be derived from the Laue diffraction pattern and the maps showing such strains can provide valuable information on microscopic distributions of local stresses/strains within a material..An early version was developed by Larson and co-workers [4] and was later refined by Kunz et al. [5]. The source code from the latter software, XMAS was redeveloped into FOXMAS. FOXMAS is deployed on a parallel streaming platform called InfoSphere Streams, an IBM product[6]. The use of stream processing will enable large changes to be made to software and to the processing speed without major rewrites of the software itself. The XRD analysis is carried out using a cluster of eight-core and duo-core blades. The choice of fitting parameters to be used by Process XRD for each analysis is made on the User Interface shown in Figure 6. An XRD map produced by Process XRD 6] is shown in Figure 7. This particular map shows the grain orientations on a nickel sample; but strain maps are also determined simultaneously. Typically, such maps are calculated about 60 times faster using Process XRD than they would be with a modern (quad core) desktop computer.


Fig 6 The “submit” page for FOXMAS calculations for XRD indexing using Process XRD and the InfoStreams-based calculation.


Fig 7. An XRD map result from process XRD.


Overview of the Web-based Distributed Software

Science Studio is a distributed system that provides the end user with a common interface to the devices and analysis programs they need to run scientific experiments. The User Interface (UI) is run from a standard web browser and it communicates with the Beamline Services and applications over HTTP. To an outside user, this distributive application looks like one application, even though the data may be coming from different databases and the devices are located in different facilities.

Science Studio is built on a Service-Oriented Architecture (SOA) framework where business functions are treated as services. Figure 8 provides a high level overview of how the main services have been organized, and how Science Studio and the Complex Devices interact with the data processing elements that are being developed as a part of the Science Studio project.


Fig 8. Schematic of the services used by Science Studio for remote access and control of parts of the VESPERS beaamline at the Canadian Light Source.


Access to Science Studio by most users is through a browser to a common portal at the University of Western Ontario. This portal handles the presentation services (Client Services) for all services regardless of where they are located. The services can reside on application servers on multiple nodes. The portal enables researchers from any location to access Science Studio facilities via the Internet. However, Science Studio software also allows each site to be accessed directly though its own web address. In this way, user access is less likely to be compromised by a service failure at the hub.

The Client Services Layer provides the server component presentation layer, the services managing the user’s interaction with the system. Unlike traditional web applications, the interface is event-driven, i.e. once the browser interface is initialized, it receives periodic messages from the UI services component indicating which sections of the interface are to be updated. An event-driven architecture is required due to the timeliness (“near real-time”) requirements of Science Studio.

The Business Object Layer provides the business logic between the User Interface and the provided Services. It is made up of three main areas: the User Office, Lab or Experiment Management and Device Management. The User Office is an abstraction business layer for all of the user services. User Services provides the mechanism to create, retrieve, and modify metadata relating to experiments such as projects, sessions, and samples. The metadata assists in organizing, automating, and improving the experiment process and data. The Lab or Experiment Management services primary functions are to provide services for setting up experiments, configuring experiments and reviewing the results of experiment. The Device Management service provides access for running experiments and collecting and recording data. The service contains device specific business logic and provides an abstraction for the user from the actual complexity of the scientific devices. It is a layer on top of, not a duplication of, the control systems provided by evice specific control software frameworks like Epics IOCs. There is one device manager service deployed per higher level abstract device, for example: a beamline, an electron microscope, or a camera. The Device Control Modules is a framework provided to make it easier to connect to devices or to the control software that controls the devices. This Device Control Framework provides a standard interface for Science Studio to access control software. This interface can remain static even when there are changes made to Science Studio or to control software. Currently there is an interface implemented to integrate with Input Output Controllers for EPICS, a common device control framework. The Device Control Modules interface leverages message queue infrastructure.

The Device Proxies provides a further layer of abstraction to the Device Control Modules, and allows Science Studio to be able to access multiple diverse devices from one simple interface, removing the underlying complexities of the device.

Service Proxies provide a common access to many services. It is the service proxies that provide access to remote or external data services and to processing services. It is these services that provide the integration glue for connecting Science Studio “nodes” and for connecting to other applications.

The Persistence Layer provides access to Science Studio’s internal database. It provides access to the metadata that is used for coordinating the collaboration of tasks, projects, data, device parameters, and system configuration and settings. It can provide the security controls if the Science Studio node is not accessing a shared security service.

Beamline Control Software

The web application uses the J2EE Servlet API to provide a web-based user interface to users of Science Studio. This web application uses the Spring framework to provide inversion of control using its Model-View-Controller (MVC) implementation. Object Relational Mapping (ORM) support is provided by the iBATIS framework which cleanly isolates SQL commands within XML mapping files. The security framework Apache Ki is used for authentication and authorization functionality. Currently, this web application is deployed on an Apache Tomcat application server.

The Science Studio database contains metadata associated with the operation of a remote controlled beamline and the organization of experimental data collected on that beamline. A “project” is the top level organizational unit, and is associated with a project team. Each team member is assigned a role, Observer or Experimenter, within the project. Experimenters have full access to the project, where as Observers have read-only access.

The BCM is a Java application which provides a high-level interface to the low-level control system. In this case, EPICS is the low-level control system and the BCM communicates with it using a Java implementation of the Channel Access protocol. The BCM provides a device abstraction so that alternate low-level control systems can be used. This is important for use of the BCM outside of the CLS. EPICS is the standard control system at the CLS and is used for control and data acquisition of nearly every device. EPICS consists of a network of Input-Output Controls (IOCs) which are connected directly to devices. Each IOC provides a number of Process Variables (PVs) which relate a value to either an input or output from a device and have a unique name. The Channel Access (CA) protocol is used to read or write to any PV in the network without needing to know which IOC provides the PV.

Processing Management Software

The protocol chosen to allow all users to search for their data across all sites is a REST interface, where an HTTP protocol can be used to query the availability of data with a response in JSON format. To decouple different SS deployments an Enterprise Service Bus (ESB) has been used. In this approach, each Science Studio site is only connected to the bus and can only send queries or receive responses through the bus. Actual transfer of data occurs using an IBM version of MQ (Message Queuing) software. This software enables the virtually simultaneous transfer of data from multiple sources in the world to the Science Studio hub at University of Western Ontario and the subsequent processing of these files – either in near real time as is being collected or in batch mode sometime after collection is completed. To be efficient, it was essential that a messaging/queuing function operate seamlessly with the version of Infosphere Streams being used for processing,.as well as with the Science Studio at each site.

The FOXMAS on-line XRD software uses IBM’s Infosphere Streams to process Laue XRD data in parallel streams working on a bank of servers containing as many as 36 “workers” on 10 blades. Some cache storage is available on discs to allow short term retention of data that is being reprocessed frequently. Otherwise raw data is not retained at the processing site; only the results of the processing that can be downloaded to the users’ sites. InfoSphere Streams is integrated with MQ to allow processing of data coming from more than one site. The Java-based UI provides a variety of processing options to a typical user of the Laue XRD Beamline (12.3.2) at ALS, as well as at the VESPERS beamline at CLS.

Process XRD can handle requests for analysis from multiple users concurrently, with the data flowing from more than one site. User requests are prioritized and scheduled using MQ. The software currently processes XRD data from Princeton CCD, Mar 133 and Decris Pilatus 1M pixel array detectors. Process XRD also has an on-line mapping service displaying maps such as orientation, elastic strain tensors, composite (von Mises) strains and diffraction spot ellipticity.

All Science Studio sites are linked by networks CANARIE in Canada and ESNET in the US where a 10G lightpath is the normal mode of operation. At present where only one processing job is in the queue the processing speed is limited by the speed of the processor. Upgrade to a 100G link is planned for the next year in advance of anticipated multiple job queues and an increase in the number of blades for the service.

Peakaboo, the analysis program for XRF spectral data, is downloadable from The Science Studio website. Peakaboo is written in Java and runs as a cross-platform desktop application. It allows users to identify the spectral origins of XRF data using a routine that fits all components of the spectrum including escape peaks and pileup peaks, and then plots their spatial intensity distributions as maps. It has support for several data formats and includes filters to clean up or otherwise manipulate data. More interested users can easily add their own filters and data formats using Peakaboo's plugin system.

Authentication

Authentication of users is particularly important since their identity must be verified by every site that they have used. The Central Authentication Service (CAS) is a single sign-on protocol for the web which allows a user to access multiple applications while providing their credentials (such as user identification and password) only once. It also allows web applications to authenticate users without gaining access to a user's security credentials, such as a password. CAS allows a user to access multiple resources available on different facilities (e.g. UWO, CLS, etc) by entering the credentials only once at the first entry point. For example, a user can login to the Science Studio deployment at CLS and when he/she wishes to use process XRD web application at UWO, access will automatically be authorized without the necessity of re-entering credentials.

Project Outcomes

The functionality of each section has been tested by our team as it was developed using live sessions. Most parts of the software have also been exposed to a limited extent to selected external users. The Science Studio software for external control of XRF functions of the VESPERS beamline has been tested by several groups who suggested numerous improvements to the screens in Figures 1(a-c). Tests included user groups as far away as Australia and as many as six separate locations. The operation of this part of Science Studio is robust and the latency factor is adequate for most controls. User groups have found it difficult to navigate the tree structure when joining a session. As well, it has still been burdensome to arrange for the level of constant beamline support that is necessary with remote users, and that has reduced the level of interest among potential users. At the outset of the project it was decided to exclude the control of those functions that might cause significant compromise to the beamline if a control should malfunction; however, that strategy has reduced the level of independence of an external user, thus perhaps diminishing its attraction in some cases. In this respect, team support aspect of Science Studio has not yet been fully tested with a contiguous group of scientific users. Such group interactions are the underlying reason for his project.

Use of the remote service at the nanofabrication laboratory is also in a very early stage with one user group. In some respects this service should be easier to promulgate because of the less formal scheduling requirements. The processing services, by contrast has fewer strictures and is under use by groups at CLS and ALS.

Concluding Remarks

The Science Studio project is configured to provide new types of services to users of major scientific facilities. While there have been many demonstrations of point-to-point remote operation of synchrotron beamlines[ ] and remote services in protein crystallography are becoming routine[ ], Science Studio is attempting to present these beamline services to the user as a sub-section of the tasks that they encounter in moving an experiment through its various stages from conception to publication. Thus, the nanofabrication facility was included in this prototype because most users of micro XRF and XRD require prior microscopic analysis of their samples and this data needs to be an intimate part of the decisions made in he interpretation of the synchrotron-based experiments. Further, the results of the many processing procedures need to be a visible and open part of the discussion of the experiment.

The project has integrated X-ray diffraction (XRD) data processing into Science Studio in such a way that users can gain rapid access to sophisticated processing services from their desktop with relatively minor advanced preparation. This process mimics that of commercial cloud services; however, the Science Studio network provides and additional powerful benefit: data transfer costs are substantially lower since the network is an academic one and no external storage of data is required. Of prime importance is the appearance of the processed result at the users’ desktop within seconds of the event, thus facilitating decisions on the course of the next experiment. It would be possible to apply the Science Studio infrastructure developed here to other types of complex computations on time-sensitive data. Such a service would best reside at a site where the software and its outcomes are best understood.

The stream computing technology is readily expandable to accommodate improved processors without any major changes to the software. This type of computation service becomes cost effective as an increasing number of users of synchrotron facilities are requiring computing power that will keep pace with the increasing output of new detectors.

As an end-to-end service, the value of Science Studio as a means for the protection of data integrity cannot be overemphasized. It is possible for a scientific team to demonstrate the provenance of all data sources, the calculations and assumptions used in any computation and the access that all team members have had to the refinement of the output. Thus, this may prove to be a major strength to the use of Science Studio in commercial applications of major science resources.

References

1. N.S. McIntyre, Nathaniel Sherry. Marina Suominen Fuller, Renfei Feng and Thomas Kotzer, Journal of Analytical Atomic Spectrometry, 25, 1381 (2010).
2. R. Feng, A. Gerson, G.E. Ice, R. Reininger, B. Yates and N.S. McIntyre, AIP Conference Proceedings, 879, 872 (2007).
3. J.S. Chung and G.E. Ice, J. Appl. Phys., 86, 5249(1999).
4. Y. Yang, B.C. Larson, J.Z. Tischler, J. Budai, G.E. Ice, Micron, 35 431 (2004);.R. Barabash and G.E. Ice, Encyclopedia of Materials: Science and Technology Updates,Elsevier Press, (2005) 1-18.
5. M. Kunz, N. Tamura, K. Chen, A.A. McDowell, R. Celestre, M. Church, S. Fakra, E. Domning, J. Glossinger, J. Kirschman, D. Mossison, D. Plate, B.V. Smith, T. Warwick, V. Yashchuk, H.A. Padmore, and E. Ustundag, Rev. Sci. Instr. 80035108 (2009).
6. M.A Bauer, A. Biem, N.S. McIntyre, Y. Xie, 256, 012017 (2010).
7. J. Chao, M.L.S. Fuller, N.S. McIntyre, A.G. Carcea, R.C. Newman, M. Kunz and N. Tamura, Acta Materiala 60, 781-792 (2012)..
8. M. A . Bauer, A. Biem, S. McIntyre, N. Tamura and Y. Xie,
9. www.esrf.eu/UsersAndScience/User Guide

Acknowledgements

The authors gratefully acknowledge the major contributions from CANARIE to this work, including funding and useful guidance and suggestions for this program. Additional support hs come from Canadian Light Source, CANARIE, The University of Western Ontario Canadian Foundation for Innovation and the natural Sciences and Engineering Research Council.