The researcher is both responsible for and in control of the choice and layout of files and in many situations, all that is needed is a mechanism for farming out the jobs to cluster nodes (or alternately, the researcher can submit the jobs manually to their cluster). The advantage of CreateBatchFiles from the researcher's perspective is that the Batch_data.h5 file generated by the module captures all of the data needed to run the analysis. Typically, a user or LIMS will break a long image set list into pieces and execute each of these pieces using the command line switches, -f and -l to specify the first and last image sets in each job. This file can then be submitted to CellProfiler on the command-line and CellProfiler will run in a batch mode, without its user interface to process the pipeline. The pipeline is then executed and the result is a Batch_data.h5 (HDF5) file which contains the image set list and pipeline. The CreateBatchFiles module is placed at the end of a pipeline. One of the advantages of this approach is the simplicity of the deployment: the file list can be constructed from a simple directory listing and the logic that builds the image set can be the responsibility of the researcher rather than the LIMS system. The best practice is to use one file list per job, without the -f and -l switches to minimize the time spent assembling each job's image sets. This file list takes the place of the one that the user assembles in the Images module. These pipelines can be used in a headless environment if a file list is supplied on the command-line using the -file-list switch. Input modules and -file-list (new in CP 2.1.2)ĬellProfiler desktop users may be more comfortable using the input modules to build image sets from a list of files. csv without modification to the pipeline, the researcher's. csv files for the researcher and replace the researcher's example. This means that a researcher can create a pipeline using a short list of representative images and a short LoadData file, then submit the pipeline to the LIMS. csv file from the command-line if you supply it using the -data-file switch. The "omero:" scheme allows CellProfiler to download images from an OMERO server.ĬellProfiler (as of c2cb201) can take the name of the LoadData. LoadData can reference images by file name or by URL, with "file:", "http:", "ftp:" and the special-case "omero:" URL schemes being supported. csv could be farmed out to a number of execution nodes. csv using a query whose fields specified the per-channel image file locations, the plate, well and site and the perturbant for the well and the pipeline and. For example, a researcher could submit a request for robotically-prepared plates to be analyzed by CellProfiler and a pipeline or pipelines to be run. csv for LoadData can be generated using a SQL query on a LIMS database. csv file supplies the input data for one execution cycle of the researcher's pipeline and each column supplies either image file location information or metadata such as physical location (plate, well and site) or sample treatment. They can be used in conjunction or used separately with the choice being largely determined by your analysis workflow. Strategy choice: LoadData and/or CreateBatchFilesĬellProfiler has two modules which have been designed to work in a server-farm environment: LoadData and CreateBatchFiles. Handling output from multiple analyses using the same pipeline.Input modules and -file-list (new in CP 2.1.2).Strategy choice: LoadData and/or CreateBatchFiles.You might wish to take a look at this paper led by Novartis describing Jenkins-CI, "an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform". If you are interested in this setup, visit the Distributed-CellProfiler project page to get more details. We have scripts and configuration files for running CellProfiler in distributed mode using the Amazon Web Services platform. This page describes some best practices for integrating CellProfiler into your LIMS workflow.ĬellProfiler can be also run in the cloud for distributing jobs among many machines. There are command-line switches that let you execute a partial analysis, there are switches that let you specify the inputs and outputs and there are modules whose primary target environment is a lab information system. CellProfiler is optimized to run analyses headless on a single thread in order to get predictable concurrency with one CellProfiler instance per blade core. At the Broad Institute, we split a CellProfiler analysis into a number of small jobs which are run on separate cores in a headless mode.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |