Dataset sizes

Tagged: 

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #552
    David Morgan
    Participant

      Hi,

      We are beginning to think about IT needs for FIB/SEM and SBFI. Can anyone give me any sort of estimate of “typical” dataset sizes? I know this is a hard topic to condense to single numbers, but at the moment, any sort of estimates based on experience would be useful. Thanks.

      #553
      AdminEM3
      Keymaster

        Hi Dave

        Our standard VolumeScope imaging run would be about 24 hrs, in which we’d collect between about 30GB and 150 GB of raw data. Good staining=faster scans, more data.

        We keep raw data intact and make a second set of 8Bit Tiff images, plus scaled-down (50% and 25%) versions. And we also make enhanced versions of the data – CLAHE filtering and ROF denoising for some work. This about doubles the data – allowing that some of the raw data doesn’t require duplication.

        You can compress images. Lossless TIF compression (lzw) works just fine and might save some space. I don’t recommend, but you may also elect to jpeg some of the working datasets. JPEG at high quality can squash down from 2-10 fold without much artifact. But lossy compression (like JPEG) definitely creates structures in the data if you are not careful.

        You can compress whole datasets. Zip and tar.gz compression archiving can crunch down your datasets somewhat for storage.

        If server space is limited, you may consider archiving raw/all data to tape systems, and maintain only active data on the server. Ask if you want to know more – tapes are cheap and disks die easily.

        What do other folk do?

        cheers
        Grahame

        • This reply was modified 5 years ago by AdminEM3. Reason: size, clarity
        #554
        AdminEM3
        Keymaster

          Of course the FIB sets are much smaller so less of an issue.

          #565
          Emily K. Benson
          Keymaster

            I can help with SBFI dataset size… Someone else recently asked this so I looked into it more closely by looking at timestamps to find out how much per day…

            The usual scenario:
            20 – 80 GB raw data per 24 hour period.

            Worse case scenario:
            There have been a couple times when it’s been 180 – 200 GB per 24 hour period – (2 fields at same time – a large 10nm/pixel field 220 µm x 60µm at super fast acqusition and a small 5nm/pixel field that was about 20 x 20 microns.)

            While those data were large, it was partly because the current version of MAPS saved large, full frame images for the 20K and 40K frames, although I was only collecting data for a “reduced area” of 220 x 60 microns. As MAPS also saved the correctly sized images in the Plugins folders, we elected only to keep them and not keep the full frame images.

            On top of that, I always allow more space (at least double – triple) needed for post processing – creating registered stacks, stitched stacks, etc. Or can do most things on external hard drives, and put raw and finished versions only on server, to avoid intermediate sets space taking up server space.

            • This reply was modified 5 years ago by Emily K. Benson.
            • This reply was modified 5 years ago by AdminEM3. Reason: edited for length and overlap gjk
          Viewing 4 posts - 1 through 4 (of 4 total)
          • You must be logged in to reply to this topic.