Ratz corner: 2012

One of the main tasks of the researchers is to publish their findings. Charts and graphs become indespensable tools in researh publications. Many times Microsoft excel or similar platforms gives ample options to create mesmerising charts. But at times creating simplified charts become a complicated affair especially if you want to avoid extensive reading and search. There are some easily available templates to create some of the charts where we can do the copy-paste option to create a data

here i am giving a list of available charts, graphs and simple statistical calculation tools available online. Please bear in mind that some of them might have copyright declarations where the authors/publishers need to be aknoledged though all of them are free of any charge. Kindly use your discretion while utilizing the tools.

Plots/Charts

1) Box and Whisker plots

this types of plots are useful when you are showing variation in the data with respect to a particular parameter. Example of such data is expression of a gene in various conditions.

Where to get it: The template for making this kind of graphs is available at http://www.vertex42.com/ExcelTemplates/box-whisker-plot.html .
The template is available (download now option) at the aforementioned site. The template could be opened in excel/similar format. The various options available are self explanatory at the website. Kindly read the terms of use at the site.

Statistical tests

Disclaimer: I am myself not an expert in statistics. Kindly make an understanding of each of the statistical tests yourselves. These are the tests i found useful in my own studies, only use them with your own discretion. Attempt here is only to make easy availability of online tools, but not on educational aspects.

1) Fisher's Exact Test

this type of tests are most useful in 2 by 2 contigency tables. In simple terms it is useful when you are having a data pertaining to two variables (eg: yes no answers to a question on whether you drink coffee among men and women...total 4 criteria, male, female, yes -coffee, no-coffee but two variables viz. Gender, and answer to question)

Where to get it: Online tests can be conducted at
http://www.langsrud.com/fisher.htm

the Tab on left panel of the page (below compute option) have 4 columns which needs to be filled by output of 4 criteria. The explanation on operations are available at the webpage concerned.

2) Mann-Whitney U test (also Known as Wilcoxons Rank sum test)

This type of test are most useful when your data is not known to behave in a normal distribution. In simple terms if there is no certainty that the data will follow a certain trend (for example the price of gold on each day of a month is not expected to follow a trend compared to the age of people in a village which will follow normal distribution (the graphical representaton of people of each age will look like a bell - a few persons in lower age group, peak between 20-40 years of age, and again few people above 60 onwards).
In research, expression of genes from tissues/cells may not follow normal distribution and Mann-Whiteny test is useful for this type of results.The expression of a gene x in 2 conditions (say stress and no stress) can be computed for statistical accuracy by Mann-Whiteny U test.

Where to get it: Online tests can be conducted at
http://elegans.som.vcu.edu/~leon/stats/utest.html

The data pertaining to each of the variable from an excel/similar chart can be directly pasted in the 4th tab (titled data set1 and data set2) and the statistics can be calculated

To be concluded...

Many times we come across specific tasks in research which needs a specific tool or a specific software. Our best friend google search most often gives us more options than what could be deciphered immediately. Here i will try to introduce a few of the freely available utilities such as softwares and tools useful for academics. Also simple introduction to the functions will be introduced. Please use it with discretion as what suits you most. Kindly acknoledge the corresponding authors in your communications. Here by i am aknoledging the individual providers/developers of the softwares/tools listed below. Copy right infringement, if any, be kindly brought to my notice.

Most of the tools listed below are for Human genome/biology, but may do good also for other applications. But emphasis is on Human biology.

Primer designing.

For PCR-
direct from genomic DNA -exons, introns, UTRs, promoters.

useful links

1) http://www.ncbi.nlm.nih.gov/ - NCBI database
2) http://www.ensembl.org/index.html - Ensembl database
3) http://frodo.wi.mit.edu/ - Primer3 software for primer design
4) http://www.ncbi.nlm.nih.gov/tools/primer-blast/ - Primer BLAST from NCBI for primer design.
5) http://www.operon.com/tools/oligo-analysis-tool.aspx - Tm and complementarity analysis tool

Most of us rely on softwares to design primers. But when it comes to designing good primers which suits us in one go it is almost impossible to predict whether the designed primers will do its job. There are no foolproof methods but you can increase the probability of finding a good primer by following a few steps.

simple steps:

1) Make sure the region for which the primer is to be designed is accurate.
Many times polymorphisms in the primer regions play spoiler. Also some genes have strong sequence similarity between pseudogenes or variants. Make sure that the region for which primers are to be designed is devoid of many polymorphisms.

NCBI database (http://www.ncbi.nlm.nih.gov/) lists all the annotated genes for which sequence can be obtained.
But a more userfriendly database is the Ensembl genome browser (http://www.ensembl.org/index.html) which lists organisms for which sequences are available.

in Ensembl browser one can simply put the gene symbol (example: ACTB- for beta actin gene) or simply the gene name as you know it.
the search will provide you with a page with matches to the search name and lists 4 subheadings a) domain b) family c) gene 4) transcripts. You can select the gene option

by clicking on the gene you will be provided with the list of organisms for which the gene details are available.
(some times to avoid confusion the genus name is provided eg: fugu for puffer fish)

clicking on the species list will give you again the 4 subheadings a) domain b) family c) gene 4) transcripts
you can again select the gene option here

The genes with similar names will be given. As a word of caution make sure that the gene you are further selecting is exactly the one you are searching for. As an example P53 will give you 204 or so out put, some very similar to P53 but TP53 is what you should go for.

Here you can do a google search for the gene ID provided in the page to make suer it is correct for the organism (specifically from published literature)

On the left tab there will Gene based display- gene summary eg: http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000141510;r=17:7565097-7590856

Also there will be a gene summary tab below the page, which could be checked for the transcripts coded in the region with exons in blocks. Checking this for the transcript variants will make sure the exons/introns selected are correct - some exons are used only in alternate promoter usage. So exon 2 may be actually exon 1 in a commn transcript. Also it will give us an idea whether the gene is coded in the negative strand in addition to genes upstream or downstream

opt for the sequence

The entire sequence will be provided with exons, 3' and 5' UTRs.

you may select the configure this page option on the left tab to select whether you want to see variations additional exons etc. It is always good to opt for variations ('yes' in variations tab in configure this page) since it helps to design primers from region where there are no known variations (SNPs)

Again to make sure the exon/intron number you are searching for is correct - you may go back to the previous page and see the tab Gene based display- gene summary and select the splice variants (if any)

clicking the splice variants option gives you a new page with graphical representation of absolute position of the transcript and exon used. Also each transcript ID will be available in the Transcripts+ (clicking + will give the trancript information). Search the transcript ID in google to get the correct transcript ID of the gene. Also the mRNA length/protein length will also be useful (simple trick: multiply the protein length with 113 to get approximate protein size in daltons).
the transcripts with CCDS (consensus CDS) numbers are well annotated transcripts and possibly is what you are looking for. Again going to the sequence page you can select the exon or intron or region of interest, copy it to an MSword or similar document. Make sure that flanking sequences, atleast 100 on each sides, are selected.

After selecting the region of interst

If your aim is the sequence the exons from patient/specimen samples make sure you are adding ~50 bases on either side of the target sequence, especially from intron. The sanger sequencing is unreliable to first 30-50 nucleotides from start. Bidirectional sequencing is intended. Also this approach will give you information on whether splice sites are affected (-10 nucleotides upstream of exon start and +10 nucleotides after the end of sequence)

method-1
Now

1) select the entire region and paste it in the Primer3 software http://frodo.wi.mit.edu/ available free online.
select the mispriming library on top of the tab (3 organisms available) if any or leave it none.

2) click both left and right primers tab

3) Leave every settings as it is and click pick primers

Now it will give you the primers in the region...if no primers are available the reasons will be listed.

This is not the task finished but a prelude to whether there are acceptable primers in the region. Some regions are repeat heavy/ or high GC rich which will not give primers in the region.

Now go back to the word document

put square brackettes ([ ]) immediately outside targetted region (including whole exon , and 30-50 nucleotide upstream and downstream of the exon region of interest). The aim of this task is that primer3 will not include the region in brackettes to select primers and only flanking primers to the brackettes will be selected.

eg: GCATTGTAGTCTTCCCACCTCCCA[GATGGCGGAGGGCAAGTAGCAAGGGGGCGGGGT
GTGAAGCACTCAGTTGCCTTCTCGGGCC]TCGGCGCCCCCTATGTACGCCTCCCTGGGCTC
GGGTCCGGTCGCCCCTTTGCCCGCTTCTGTACCACCCTCAGTTCTCGGGTCCTGGAGCAC
CGGCGGCAGCAGGAGCTGCGTCCGGCAGGAGACGAAGAGCCCGGGCGGCGCTCGTACTTC
in above example i am searching for primers outside the square brackettes.

out put is as:

GCATTGTAGTCTTCCCACCTCCCAGATGGCGGAGGGCAAGTAGCAAGGGGGCGGGGTGTG
   >>>>>>>>>>>>>>>>>>>>>**************************************

   61 AAGCACTCAGTTGCCTTCTCGGGCCTCGGCGCCCCCTATGTACGCCTCCCTGGGCTCGGG
      *************************

121 TCCGGTCGCCCCTTTGCCCGCTTCTGTACCACCCTCAGTTCTCGGGTCCTGGAGCACCGG
                 <<<<<<<<<<<<<<<<<<<

181 CGGCAGCAGGAGCTGCGTCCGGCAGGAGACGAAGAGCCCGGGCGGCGCTCGTACTTC

note that the starred sequence is what we gave inside the brackettes and primers were chosen outside the starred region.

now put back the entire sequence with brackettes to the primer3 tab.

select the product size range, say 450-500 (should keep a flexible range not a single number) remove all other default ranges. Go for the minimum but essential range (for an exon of 300 bp with added flank 50 on either side -totaling 400 bp (now in brackettes) a range of 400- 450 will do) remember that the smaller the product size the higher the chances of PCR success.

then move below the page: Go to General Primer Picking Conditions.
primer size :
Put the primer size range as per your need (usually minimum - 18 bases, optimum (opt) - 22 bases, max (maximum) - 26 bases. The programme will try to give you optimum results but if it cannot find one it will use the min and max ranges.

product tm :

care should be taken here. For normal sequence, optimum tm is around 54-60 degree. But if your sequence is GC rich it could go way high and if it is AT rich it might be lower. Also try to design all the primers you use to have a small range of optimum Tms. This way all your PCRs can run at a common annealing temperature and it is easier to standardize the primer. This will free you of the task of amplifying each target in one PCR machine at a time, all your PCRs can run at a time in one PCR. I try to keep the Tm close to 58-60 so that all my PCRs can run with same annealing temperature later. Go for higher or lower Tm only if your sequence is troublesome.

min- 54 opt 57 max 62 is a good option. But decide carefully depending on your sequence. Promoter sequences or GC rich sequence may require higher max Tm

Leave all other options at default setting.

and click pick primers.

Here you have to check a few things

Product size, Tm of primers, primer complementarity (should be the minimum possible so repeat this exersize with changed parameters such as region in brackettes, Tm etc). also note where the primers are sitting in the original sequence.

additional primers are also provided in the same page so that you may opt for another pair.

Method-II

A much more simpler way of primer design is the NCBI PrimerBLAST http://www.ncbi.nlm.nih.gov/tools/primer-blast/.

This works almost the same way as that of the method-I. It is much more simpler with userfriendly options. Being comparatively new, i used this method sparingly but i have got feed back from users saying it works perfectly fine. The user interface is much more simpler than Method-I though the output format is a bit clumsy than the Primer3 software.

(To be concluded)

Ratz corner

Pages

Total Pageviews

Sunday, 11 November 2012

Some useful common tools in higher research (genomics/cell biology) -charts and graphs

Saturday, 10 November 2012

Some useful common tools in higher research (genomics/cell biology) _Primer design

Do you think the latest affidavit by Planning Commission in front of Supreme court of INDIA regarding the definition of Poverty Line is a blatant act of disrespect to Indian masses?

Popular Posts