Thursday 16 May 2013

CMM's Chemistry X-ray Facility is on Mirage

As of a couple of days ago, the "B2" and "B5" instruments in the CMM's Lab in Chemistry are now connected to Mirage.

So now all 4 CMM Labs are "on Mirage" to some extent.

Technical stuff

The Chem Lab instruments don't save their data directly to a Samba server (like the other Labs).  Instead, they save their data locally, and the data is "pulled" by a periodic rsync from CMMData.  (This used to happen, once a day, but now it happens every 20 minutes.)

This presented a couple of "interesting challenges":
  • The data grabber software needed significant changes to cope with files delivered by rsync.  The files get touched repeatedly by the rsync, so we need to take special steps to prevent them being ingested multiple times.
  • The B5 machine is one of a few machines that are operated by CMM staff exclusively doing runs for CMM users. So we needed to implement a special mechanism for automatically assigning files to the right CMM user.

Friday 26 April 2013

Recent changes to the Mirage User Interface

There have been a number of changes to the Mirage user interface in the past few days.

Source metadata

Mirage now displays "administrative" metadata about the data capture process; e.g. who was logged in, what instrument was used, and the "save" file and folder names for each captured Datafile.

Downloaded ZIP / TAR files

When you download an Experiment or selected Datasets or Datafiles the TAR / ZIP archive now has file and folder names that match the names used to save the files on the instrument.   If you are downloading an entire Experiment, you can still get the "classic" MyTardis organization.

Update - 2013-05-06:  Mac users are no longer offered the option of downloading in ZIP format because of Mac's problems with reading the "streamed" ZIP file format that MyTardis uses.  You can use TAR format instead.

Sharing data with other Mirage users

It is now possible to share Experiments with other Mirage users, or groups using the "sharing" tab in the Experiment view.  You can grant read-only access, read-write access, or full ownership rights.

If you wish to create a Mirage group for sharing data within your research group, please contact me.

Credits

The "data sharing" changes and other ones that you may notice are the work of Steve Androulakis and Grischa Meyer at Monash Uni.  For the MyTardis audience, this the result of the "master-3.0 merge".  Kudos to Steve A!

Sunday 7 April 2013

QBP Lab is now on Mirage

The campaign to integrate the instruments in QBP Labs with the rest of CMM is now (nearly) finished.

On Friday I finalized the deployment of the data grabber and ACLS proxy to the QBPDATA server (the replacement for DejaVu) and hooked up the first instrument.  This morning I hooked up the atom feed to the production MIRAGE system, and ... the data has started to flow.

The last step is to hook up as many of the QBP instruments as possible.

Tuesday 12 March 2013

Data Migration in production on Mirage

The long awaited MyTardis "datafile migration" extensions are now deployed to CMM's Mirage production system.  The data is being mirrored as I write this.

What does this mean?  The short answer is that I can now add the rest of CMM's instruments to Mirage without filling our local file servers with a few months of data.  So expect to see the remaining AIBN and Hawken instruments added to Mirage in the next few days and weeks.



For people interested in developing / deploying MyTardis, there is a "pull request" waiting for merging into mytardis:master on Github.  Read the pull request comments for details on what is involved.

Update: The pull request has been merged, so the "migration" code is not in the "master" MyTardis codebase.

Sunday 17 February 2013

The Mirage project web pages

If you are interested in how Mirage is implemented, and/or are considering implementing your own data management system, take a look at the Mirage Project web pages.  They include an architectural overview, a description of the components we developed, and a section on how we interface with ACLS.

Monday 5 November 2012

Progress on Mirage Data Migration

The Disk Space Issue

The main issue that is preventing us from rolling out Mirage for the majority of CMM's instruments is disc space.  Basically, if we did a "full" roll-out, we would fill up the available disc space in a small number of months, and then we would have to start deleting older files to make space That would mean that users would need to manage their own long-term file storage, and we would essentially be back where we started.

Then there is the new 3View system that is capable of generating 10Gb of data in a single night, or 20 Gb after rendering the images as tif files.

Managing Lots Of Data

We are addressing the issue of "too much data, not enough disc space" in two ways:
  • For data that needs to be kept online, we are currently implementing a scheme where individual data files are "migrated" to secondary online locations within UQ.  As far as users are concerned, the data files will be accessed as before, except that access will be a just little bit slower.
  • For data that no longer needs to be online, we will be implementing an archiving scheme in which snapshots of entire experiments complete with all relevant metadata are saved to offline storage.

Progress So Far

I am currently developing the code for the datafile migration subsystem for Mirage.  The basic file migration code is working in the Mirage test system, and the code that will decide what files to migrate and when to migrate them is in progress.  The initial migration system will take into account the size of the individual files, their file types, and when they were created and last accessed.  Later on, I intend to allow users to indicate the relative importance of files, datasets and experiments to influence the migration decision making.

(For MyTardis folks, the Mirage migration code is actually a MyTardis "app".  To use it, you will need to set up one or more secondary "destinations", which can simply be private WebDAV servers on some other machine with lots of disk space.  Look in my MyTardis repo on GitHub for the code.)

The other aspect that needs to be sorted out is actual disk space provisioning.  I have negotiated some space on the UQ HPC cluster for interim storage, but "the real thing" will be implemented on the QERN system that QCIF is currently developing.  We are currently "on the list" for transition onto QERN.

Tuesday 23 October 2012

Mirage is in production

Mirage has been in limited production on the following CMM's instruments for a few months:
  • The JEOL Neoscope in the Hawken Lab
  • The JEOL 6610 in the Hawken Lab
  • The JEOL 6440 in the Hawken Lab
  • The JEOL 7100 in the AIBN Lab
If you are CMM user of one of these instruments, this means that you have a new way to access and organize the data from your sessions on these instruments.

How to access your data

Simply do the following:
  1. Open a web browser and visit "http://mirage.cmm.uq.edu.au"
  2. Click the "Login" button or link.
  3. Enter your ACLS account name and password, and click "Login"
  4. Now you can either click the "Data" link to go your data, or click the "Getting Started" link for an overview of how to use Mirage.

How your data got there

Data files written to the "S:" drive on the instrument is "grabbed" and assembled into Datasets based on their file names and when they were written.  If the grabber can work out who the Datasets belong to, they are sent to Mirage automatically.

If the data grabber can't work out who Datasets belong to, they are kept in the instrument's "hold" queue for a couple of weeks, or until someone claims them.  This will typically happen if you forget to login to the instrument using your own ACLS account.  Always remember to log in and log out.

Why is your data missing from Mirage?

  • If your data was saved before the instrument was Mirage enabled, then we don't have it.
  • It might be stuck in the "hold" queue ... because you forgot to log in.
  • It might have been sent to someone else's account ... if you forgot to login and they forgot to log out!
  • If you are in the habit of renaming files on the "S:" drive, you may have confused the grabber.  The data is probably there under the old name. (Wait until you can access the files in Mirage before you start renaming and reorganizing things.)
And if none of that helps, your data should still be accessible in the old way via the "R:" drive on one of the Lab PCs.  (But of course, you only have a week to retrieve it from there.)