The Five Most Mutated Genes in Cancers – [A 2017 ICGC Perspective]

The International Cancer Genome Consortium (ICGC) has a portal that currently (May 2017) hosts data from 70 cancer projects spanned across 16 countries.

Here are a few descriptors of the data (as of current):

– 19,305 donors
– 31 tumor types in 21 primary tumor sites
– data types include: simple somatic mutations (SSM), structural somatic mutations, copy number somatic mutations (CNSM), sequence and array based gene expression data, methylation data, protein expression data, etc.

This is big data because it comprises of ~163,000 files in ~1.2 PB (petabytes), which is the equivalent of 1,200 terabytes or 1,200,000 gigabytes. A lot of A,C,T,G sequences…

The portal is a great platform in of itself, in that you can do advanced searches and ‘onsite’ data analyses, genome browsing, and much more. So, if you like numbers (like me), you can literally spend countless hours trying to make sense of this ever growing ocean of data.

The purpose of this post is not to go deep though; I may do that in later posts. Here, I’m only going to talk about the top 5 mutated genes with high impact (simple somatic mutations) across all cancers from 10,648 donors.

Using Machine Learning to Diagnose Cancer – A Tutorial


Some of you might have heard about diagnosing different health conditions with the use of artificial intelligence and machine learning. Artificial intelligence is a buzz word these days and for those who know little about programming it might actually seem real. But it’s not, at least not in 2017…

Like Kevin Kelly, I prefer to use AI as an acronym for augmented intelligence to describe learning machines.

So, what do these learning machines do and how come they are so very powerful at certain tasks? Well, let’s look at a specific example.

I’ll be using a machine learning library in Python on a cancer dataset to classify tumors as malignant or benign.

Biochemistry – Fatty Acid Metabolism [Video Series]

If you’ve been following my channel on Youtube, you know that some of the videos I make are biochemistry related. I just partially completed a series on fatty acid metabolism, which is in accordance to Lehninger’s Principles of Biochemistry textbook. It is likely that I’ll add more videos to the list in the future. But for now, here’s the ‘partially-complete’:

Genetic Mutations and Celiac Disease – My Analysis of 80 Genomes


This is my third analysis of genotype and phenotype data from OpenSNP, which is a platform where people share their genetic data.

The first analysis was about smoking and the second about diabetes. I took a few genetic mutations (SNPs) associated with these conditions and looked into the genetic and phenotype data provided by the users of the platform.

Intermittent Fasting 16-8 for 8 Weeks in Resistance Trained Males – [2016 Study]


Researchers from universities in Italy, Brazil and the United States did a study comparing resistance trained (RT) athletes who engaged in intermittent fasting (16/8) with RT athletes who ate normally.

The experiment ran for 8 weeks and the study was published in the Journal of Translational Medicine in October, 2016. You can read it here.

My purpose with this post is to give some thoughts about this study. I also did a video review.

Genetic Mutations and Diabetes – My Analysis of 115 Genomes


Last week I began analyzing genotype and phenotype data available through OpenSNP, a platform where people share this type of information.

The first phenotype I looked into was about smoking.

Using Python I took the smoker status reported by users and correlated it with a mutation (rs1051730) in the nicotinic acetylcholine receptor alpha 3 subunit CHRNA3 gene. A few genome wide association studies (GWAS) linked this mutation to nicotine dependence, alcohol abuse, and susceptibility of developing lung cancer.

My point with the post was to offer a proof of concept and to reveal/interpret the data I got out of my Python analysis. I wanted to create a precedent so that others could freely use and improve my scripts and my approach.

Of course, if you’re a user of OpenSNP, you can gain a lot of insight by looking at your own genotype for this SNP (single nucleotide polymorphism) and correlate it with my findings. To see the exact details of what I did and to download the Python codes, go and read the post.

Anyhow, I decided to continue with another analysis.

Analysis of 243 Genomes – My First Report [Nov. 2016]


About two weeks ago I learned about this website OpenSNP where people can share their genetic information and not only. It is similar to 1000genomes, but I think it is much more interesting to work with because aside of genetic information (SNP sequencing, exome, etc.) most users also share phenotype data; data is not anonymized. This is what sparked my interest.

With phenotype data and user’s genetic mutations – SNPs – (or other relevant genetic information), I could run analyses and find possible correlations. This is applied big data.

In this post, I’ll explain how I conducted my first analysis. I want to provide an outline with enough relevant details so I can have a reference point to make things easier in future analyses. Of course, I could simply do this in private but I’d rather post it on the blog so that others who are interested to run similar analyses can have starting point.

This involves: knowledge of genomics, genomics related software and raw data formats, programming, and a lot of patience.

Radiotolerance Lessons from the Tardigrades


Image: female tardigrade containing eggs.

Hashimoto and colleagues (2016) published an article in Nature recently:

Extremotolerant tardigrade genome and improved radiotolerance of human cultured cells by tardigrade-unique protein

Tardigrades, a.k.a. water bears, are some of the most extreme organisms, capable of surviving in the most un-habitable environments and being exposed to insults that would kill other living beings. Examples include: very high and very low temperatures, high doses of radiation, high pressure, outer space, and others.

Here are some of the particularities (in terms of gene expression) of tardigrades:

The Hallmarks of Cancers #1 – Deregulating Cellular Energetics


I wrote a moderate-length review of Hanahan and Weinberg’s papers a few months ago.

In their papers, they discuss the most common similarities among cancers and they base their writing on ~5 decades of research in this field.

While each cancer is unique, especially if we view it from a genetics standpoint, Hanahan and Weinberg discuss 8 hallmarks they found to be common in cancers.

Data from David Blaine’s 44-day Fast – [Metabolic and Physiologic]

David Blaine - Macro and micronutrient looses Study - 1


David Blaine has subjected himself to a prolonged fasting experiment lasting between Sept. 5 and Oct. 19, 2003.

“A 30-year-old male, weight 96 kg, height 1.84 m, entered a transparent Perspex box on the banks of the river Thames in London and was suspended in the air from a crane for 44 days. During this period, he took only water to drink.” [2]

At the end of the fast, Blaine had lost 24.5 kg and ~8 BMI points (29 => 21.6). Though his BMI was not life-threatening, he was admitted to the hospital for intensive and careful refeeding, as some of his biomarkers were out of normal limits. [1]

Several research studies have been published based on Blaine’s self experiment. Let’s see some data…

wordpress themes