De Novo Genome Sequencing

Nucleome is India’s only service provider of Whole-genome sequencing and hybrid assembly services using Illumina NovaSeq 6000 and PacBio Sequel II. To sequence a genome of an organism, we use PacBio Sequel II, optical mapping and HiC approaches. We have completed genome sequencing & assembly of many organisms like microbes, fungus, plants, insects and animals using Illumina NovaSeq 6000 and PacBio Sequel II. Depending on the sample type and budget available, we can design a suitable strategy for your project. Visit PacBio Sequel II service page for more information. If you wish to sequence a vertebrate genome, we are the most trained team in India. In Partnership with the Vertebrate Genomes Project, Nucleome is working with Indian researchers and VGP to sequence 100 vertebrate genomes from India. 

For very high-quality genome assembly, we use Pacbio Sequel II data, scaffold with Bionano datasets and finally use HiC to build chromosome level scaffolds. We perform the assembly assessment and curation multiple times to ensure scientists get the error-free scaffolds from available data sets. We were involved in sequencing genomes of Tiger, Great Indian Bustard, Mango, Pigeon Pea, Pomegranate, Mithun and many microbial and fungal genomes. 


Our team can assist you in designing the project, selection of technology, sequencing and analysis. Call our team now at +91 40 4011 4169 or send a query to info @

Molecular markers

Molecular marker technology enables plant breeders to select individual plants based on their marker pattern (genotype) rather than their observable traits (phenotype).  This process is called marker assisted breeding (MAB) or marker assisted selection (MAS).  MAB offers many benefits, including speeding up a plant breeding program’s progress, increasing accuracy and efficiency, and decreasing costs.

This animation goes through an example of MAS in developing wheat resistant to a disease. It outlines the guidelines for planting yieldguard rootworm (YGRW) corn using an Insecticide Resistance Management (IRM) plan. Different field planting options are illustrated within the animation. Note the population of resistant and susceptible beetles and mating between resistant and susceptible beetles. The planting strategy consists of a refuge, road or ditch and an adjacent field.

Marker-assisted selection (MAS) is a method of selecting desirable individuals in a breeding scheme based on DNA molecular marker patterns instead of, or in addition to, their trait values. When used in appropriate situations, it is a tool that can help plant breeders select more efficiently for desirable crop traits. However, MAS is not always advantageous, so careful analysis of the costs and benefits relative to conventional breeding methods is necessary.

Different marker types have variable characteristics. Desirable qualities of molecular markers include the following:

  • Polymorphic
  • Reproducible
  • Evenly distributed across the whole genome (not clustered in particular regions)
  • Inexpensive
  • Easy to analyse
  • Co-dominant (so that heterozygotes can be distinguished from homozygotes)
  • Possibility of being outsourced

Until recently there were a number of marker types available, including RAPDs (Random Amplification of Polymorphic DNA), AFLPs (Amplified Fragment Length Polymorphism), ESTs (Expressed Sequence Tags), etc. For the most part these older markers are now obsolete because they did not meet these criteria as well as newer markers do.  The 2 main marker types used today are:

  • SSRs (Simple Sequence Repeats, also called Microsatellites)
  • SNPs (Single nucleotide polymorphisms)

SNPs have many advantages, including being high-throughput and low in cost.   One older marker type, SSRs, are also still being used as of this writing, however, so are also included here.  We will now briefly describe both of these marker types, starting first with SSRs.

Microsatellites (SSRs)

Microsatellites, also called simple sequence repeats (SSRs), are tandemly arranged blocks of short nucleotide sequences, usually 1-10 nucleotides long (though more typically 2 or 3), repeated up to 50 times within the plant genome. The number of repeat units in the block can vary noticeably between individuals within a species. This variation can be targeted by PCR, by placing the primers either side of the block. This leads to highly reproducible, co-dominant, easily analyzed and polymorphic markers. As a result, SSRs represent one of the most widely used markers in MAB.

SNPs (Single nucleotide polymorphisms)

SNPs (pronounced “snips”) are differences in DNA sequence of just one (or sometimes a small number of) nucleotides. Where these differences occur within a genic sequence, they are more often than not phenotypically neutral, but sometimes they can be associated with a change in the amino acid sequence of the gene product. They are very common, and are distributed throughout the genome.  SNP genotyping can be relatively simple, but SNP discovery generally requires extensive DNA sequencing. However, because much of the procedure is automated, the price of SNPs is much less than earlier types of markers including SSRs, so SNPs now dominate the molecular breeding field.

Genotyping is the process of determining the genetic constitution-the genotype-of an individual plant by examining their DNA sequence using biological assays and comparing it to another individual’s sequence or a reference sequence.

Plant phenotyping is the comprehensive assessment of plant complex traits such as growth, development, tolerance, resistance, architecture, physiology, ecology, yield, and the basic measurement of individual quantitative parameters that form the basis for the more complex traits. Examples for such direct measurement parameters are image-based projected leaf area, chlorophyll fluorescence, stem diameter, plant height/width, compactness, stress pigment concentration, tip burn, internode length, colour, leaf angle, leaf rolling, leaf elongation, seed number, seed size, tiller number, flowering time, germination time etc

Optical Mapping

Optical mapping is a molecular technique that produces fingerprints of DNA sequences in order to construct genome-wide maps. The sequence markers can be ordered restriction fragments, or specific sequence motifs (nick sites). The optical mapping procedure first stretches relatively intact (minimally-sheared) linear DNA fragments on a glass surface or in a nanochannel array, and then directly images the locations of the restriction sites or sequence motifs under light microscopes, with the aid of dye or fluorescent label. Optical mapping has been widely used to improve de novo plant genome assemblies, including rice, maize, Medicago, Amborella, tomato and wheat, with more genomes in the pipeline. We use Bionano Irys system to offer optical mapping service that provides long-range information of the genome and can more easily identify large structural variations. The ability of optical mapping to assay long single DNA molecules nicely complements short-read sequencing which is more suitable for the identification of small and short-range variants.

Optical Map guided genome assembly

There are several ways in the assembly process that optical mapping can assist in building high quality reference genomes. De novo constructed optical maps offer independent evidence to connect and bridge adjacent sequence contigs or scaffolds. Genome assemblies guided by optical maps consist of three key computational steps. The initial step is the de novo assembly of optically mapped molecules to construct a ‘consensus’ optical map from single DNA molecules at high redundancy. The consensus map has to deal with errors specific to optical mapping including missing cuts, false cuts, inaccurate fragment sizes, and chimeric maps. The next step is to align the in silico digested contig sequences to the consensus optical map. The final step is the joining of neighbouring contig sequences to construct supercontigs on the basis of their locations on the optical map. For small microbial genomes, the resulting assemblies could contain a single extent of sequence that spans the entire genome, while for large eukaryotic genomes the combined efforts of sequencing and optical mapping often result in substantially increased scaffold N50. In several cases, the mapping data allow the reconstruction of entire chromosomes. Beyond ordering and orientating contigs, optical maps provide an additional layer of validation to the sequence assemblies. Optical maps could potentially identify and resolve misassemblies – false joins, inversions or translocations that are artifacts, which occurred during the sequence assembly.

Genome Resequencing

Whole genome re-sequencing approach can be used to the underlying mechanisms of species origin, development, growth, and evolution. Using Whole Genome Resequencing, the complete genome data from one or more variants can be aligned to known reference genome of the species. Applications of WGS include detection of genetic differences between variants, transposon fingerprinting for assessing germplasm diversity and lineages, and mapping loci associated with specific traits, such as disease resistance.


We prefer Illumina sequencers for genome resequencing. PCR free libraries and coverage of minimum 30x is recommended for better results.

Human Genome Sequencing

Nucleome offers highly precise, inclusive human whole-genome sequencing services, giving researchers, Physicians and patients the clearest picture of the genome. During whole genome sequencing, we collect a DNA sample and then determine the identity of the 3 billion nucleotides that compose the human genome. Today, most genetic testing focuses on one or a few genes, rather than the entire genome. However, with the availability of human genome sequencing service in India at Nucleome Informatics, now more individuals are pursuing this option. Physicians can look at an entire genome to see how specific treatments for a disease will be affected by an individual’s unique genome. For example, the physician may opt to look at genes involved in drug metabolism when deciding dosage. In the future, whole genome sequencing may enable everyone to develop a personalized treatment plan.

Advantages of Whole Genome Sequencing

  • Creating personalized plans to treat disease may be possible based not only on the mutant genes causing a disease, but also other genes in the patient’s genome.
  • Genotyping cancer cells and understanding what genes are misregulated allows physicians to select the best chemotherapy and potentially expose the patient to less toxic treatment since the therapy is tailored.
  • Previously unknown genes may be identified as contributing to a disease state. Traditional genetic testing looks only at the common “troublemaker” genes.
  • Lifestyle or environmental changes that can mediate the effects of genetic predisposition may be identified and then moderated.

Sequencing Strategy

  • 350 bp insertion DNA library.
  • Illumina HiSeq Platform, Paired-end150 bp.
  • Sample Requirements.
    • DNA amount quantified by Qubit 3.0
    • For fresh sample: ≥2.0 μg (for two libraries prep); minimum: 500 ng
    • For FFPE sample: ≥3.0 μg (for two libraries prep); minimum:1μg
    • DNA concentration: ≥20 ng/μL
    • Total volume: ≥10 μL
    • Purity: OD260/280= 1.8-2.0 without degradation or RNA contamination
    • Turnaround Time: Within 45 days from sample verification Additional 15 days for standard bioinformatics analysis
  • Recommended Sequencing Depth
    • For normal sample: effective sequencing depth 30X
    • For tumor sample: effective sequencing depth 50X

Bioinformatic Analysis

  • Data quality control: filtering reads containing adapter or with low quality
  • Alignment with reference genome, statistics of sequencing depth and coverage
  • SNP/InDel/SV/CNV calling, annotation and statistics
  • Somatic SNP/InDel/SV/CNV calling, annotation and statistics (paired tumor samples)

For more information please visit our Youtube Channel;

If you are interested to sequence your genome and need more information, please write to us at

Exome Sequencing

Nucleome offers Exome sequencing, a cost-effective approach to whole-genome sequencing as it targets only the protein-coding region of the human genome responsible for a majority of known disease-related variants. Whether you are conducting studies in rare Mendelian disorders, complex disease, cancer research, or human population studies, Nucleome’s comprehensive whole-exome sequencing service provides a high-quality, affordable and convenient solution. We use Agilent SureSelect Human All Exon V5/V6 Kit for exome capture and Illumina NovaSeq 6000 for sequencing. We guarantee that ≥ 80% of bases have a sequencing quality score ≥ Q30, which exceeds Illumina’s official guarantee of ≥ 75%.

Exome Sequencing applications:

  • SNV/Indels Discovery
  • Copy Number Variation Discovery
  • Trio Analysis

Exome Capture Kits:

  • SureSelect XT All Exon V5 Kit
  • SureSelect XT All Exon V6 Kit
  • SureSelect XT custom Tier 1

Sequencing Platform

  •  NovaSeq 6000

Standard Data Analysis

  • Variant Calling (SNPs/InDels) & Annotation

Advanced-Data Analysis

  • CNV (Copy Number Variation)
  • Various Variant Calling Pipeline
  • Cancer Analysis/ Family Analysis / Population Analysis

Custom Sequencing

  • Variant Calling (SNPs/InDels) & Annotation

*Sample conditions
-DNA amount: 1μg or more, volume: 20μl or more, concentration: 20ng/μl or more
-FFPE amount: 1.5μg or more

We offer custom bioinformatics analysis too that includes data QC, mapping with the reference genome, SNP/InDel, somatic SNP/InDel calling, statistics and annotation. We are also involved in developing a panel for Inherited Retinal Disease-specific for the Indian population. Contact us if you wish to collaborate with us to perform any exome sequencing study.

Single Cell DNA Sequencing

The genomic heterogeneity of cell populations can be explored at the level of the individual cell. Genetic changes, such as point mutations and copy number variation occurring during disease and normal development processes, are profiled using the minute amounts of DNA from single cells. Applications include analysis of genetic heterogeneity within unicellular and multicellular organisms, detection of chromosomal anomalies in germ line cells, preimplantation genomic screening of embryos, and defining the genetic composition of tumors for developing more targeted therapies.


The single cell DNA sequencing service includes sample QC, amplification, library preparation, sequencing and bioinformatics analysis. We use the MALBAC (multiple annealing and looping based amplification cycles) PCR-based method, which provides uniform data while reducing rates of false positives and false negatives.

Genotyping By Sequencing

This application is used to compare genotypes through the mapping of large numbers of SNPs or other markers. Genotyping by sequencing (GBS) is a rapid and cost-effective approach which uses a restriction enzyme digestion step to reduce genome complexity, so GBS can be applied to large genomes, and the end reads of the restriction fragments allow variants to be compared when no reference genome is available.


Applications of genotyping by sequencing include tracking plant and animal genotypes in breeding programs and conservation projects, examining the diversity of natural populations, discovery of new genetic markers, and screening variants prior to whole genome re-sequencing.