Earth BioGenome Project


[Lewin, H. A., Robinson, G. E., Kress, W. J., Baker, W. J., Coddington, J., Crandall, K. A., … & Goldstein, M. M. (2018). Earth BioGenome Project: Sequencing life for the future of life. Proceedings of the National Academy of Sciences, 115(17), 4325-4333.]

this should make one humble. …

We are only just beginning to understand the full majesty of life on Earth. Although 10–15 million eukaryotic species and perhaps trillions of bacterial and archaeal species adorn the Tree of Life, ∼2.3 mil- lion are actually known, and of those, fewer than 15,000, mostly microbes, have completed or partially sequenced genomes


Pretty Ambitious!  … and rather important!

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe Blog

Are you new to blogging, and do you want step-by-step guidance on how to publish and grow your blog? Learn more about our new Blogging for Beginners course and get 50% off through December 10th. is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Carotenoids Database provides information on 1182 natural carotenoids 

The Carotenoids Database looks pretty cool!

According to the site it “currently provides information on 1182 natural carotenoids  in 700 source organisms..

Check it out here 

A recent review paper,  [Rodriguez-Concepcion, M., Avalos, J., Bonet, M. L., Boronat, A., Gomez-Gomez, L., Hornero-Mendez, D., … & Ribot, J. (2018). A global perspective on carotenoids: Metabolism, biotechnology, and benefits for nutrition and health. Progress in lipid research, 70, 62-93.]


Carotenoids are isoprenoid metabolites synthesized by all photosynthetic organisms (including plants, algae and cyanobacteria) and some non-photosynthetic archaea, bacteria, fungi and animals. In photosynthetic systems, carotenoids participate in light harvesting and they are essential for photoprotection



In addition, carotenoids can be cleaved to produce compounds with roles as growth regulators, such as abscisic acid (ABA) and strigolactones, as well as bioactive molecules. Most animals (including humans) do not synthesize carotenoids de novo but take them in the diet and use them as essential precursors for the production of retinoids such as vitamin A. Additionally; carotenoids have been proposed to confer other health benefits whose discovery is spurring their use in functional food products.


This needs to be included in the Artificial Intelligence for Medicine Initiative

Bats are the longest-lived mammals for their size

I didn’t know that 🙂 apparently others do … cool!

Only 19 species of mammal are longer-lived than humans given their body size, and 18 of these species are bats

check this paper out.  [Foley, Nicole M., Graham M. Hughes, Zixia Huang, Michael Clarke, David Jebb, Conor V. Whelan, Eric J. Petit et al. “Growing old, yet staying young: The role of telomeres in bats’ exceptional longevity.” Science advances 4, no. 2 (2018): eaao0926.]



klotho and βklotho, were highly correlated with longevity

more later … if you really want the details .. look at


Li, Jun-Yan, Hsin-Yi Chen, Wen-Jie Dai, and Calvin Yu-Chian Chen. “Deep Learning to Investigate Longevity Drug.” Available at SSRN 3361157 (2019).


in any case, from the virtual experiment it looks like Antifebrile Dichroa, ArecaeSemen and Gelsemium sempervirens are part of the mystery.

Watson for Oncology (WFO) – more details

Back to Watson for Oncology (WFO). … so today was deep dive day to look at what papers were written specifically re WFO.

So,  on Sunday, June 23, 2019, using Google Scholar … the list below is of  the main useful things I could find.


  • Shows promise
  • Not ready for solo flight (i.e. needs clinicians to work with it).
  • Benefits from adding diagnostic tests liken GEA (Gene expression assays)
  • Keep working on improving WFO, and understand specifics better.


Literature I looked at, will look at it again in more detail, and provide further insights.

  1. Choi, Y. I., Chung, J. W., Kim, K. O., Kwon, K. A., Kim, Y. J., Park, D. K., … & Sung, K. H. (2019). Concordance Rate between Clinicians and Watson for Oncology among Patients with Advanced Gastric Cancer: Early, Real-World Experience in Korea. Canadian Journal of Gastroenterology and Hepatology, 2019.
  2. Kim, Y. Y., Oh, S. J., Chun, Y. S., Lee, W. K., & Park, H. K. (2018). Gene expression assay and Watson for Oncology for optimization of treatment in ER-positive, HER2-negative breast cancer. PloS one, 13(7), e0200100.
  3. Schmidt, C. (2017). MD Anderson breaks with IBM Watson, raising questions about artificial intelligence in oncology. JNCI: Journal of the National Cancer Institute, 109(5).
  4. Zhang, X. C., Zhou, N., Zhang, C. T., Lv, H. Y., Li, T. J., Zhu, J. J., … & Liu, G. (2017). 544P Concordance study between IBM Watson for Oncology (WFO) and clinical practice for breast and lung cancer patients in China. Annals of Oncology, 28(suppl_10), mdx678-001.
  5. Zou, F., Liu, C. Y., Liu, X. H., Tang, Y. F., Ma, J. A., & Hu, C. H. (2018). Concordance Study between IBM Watson for Oncology and Real Clinical Practice for Cervical Cancer Patients in China: A Retrospective Analysis. Available at SSRN 3287513.
  6. Somashekhar, S. P., Sepúlveda, M. J., Puglielli, S., Norden, A. D., Shortliffe, E. H., Rohit Kumar, C., … & Ramya, Y. (2018). Watson for Oncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board. Annals of Oncology, 29(2), 418-423.
  7. Somashekhar, S. P., Sepúlveda, M. J., Norden, A. D., Rauthan, A., Arun, K., Patil, P., … & Kumar, R. C. (2017). Early experience with IBM Watson for Oncology (WFO) cognitive computing system for lung and colorectal cancer treatment.
  8. Somashekhar, S. P., Kumarc, R., Rauthan, A., Arun, K. R., Patil, P., & Ramya, Y. E. (2017). Abstract S6-07: Double blinded validation study to assess performance of IBM artificial intelligence platform, Watson for oncology in comparison with Manipal multidisciplinary tumour board–First study of 638 breast cancer cases.
  9. Liu, C., Liu, X., Wu, F., Xie, M., Feng, Y., & Hu, C. (2018). Using artificial intelligence (Watson for oncology) for treatment recommendations amongst Chinese patients with lung cancer: Feasibility study. Journal of medical Internet research, 20(9), e11087.
  10. Ross, C., & Swetlitz, I. (2017). IBM pitched its Watson supercomputer as a revolution in cancer care. It’s nowhere close. STAT News.
  11. Zauderer, M. G., Gucalp, A., Epstein, A. S., Seidman, A. D., Caroline, A., Granovsky, S., … & Petri, J. (2014). Piloting IBM Watson Oncology within Memorial Sloan Kettering’s regional network.
  12. Herath, D. H., Wilson-Ing, D., Ramos, E., & Morstyn, G. (2016). Assessing the natural language processing capabilities of IBM Watson for oncology using real Australian lung cancer cases.
  13. Bach, P., Zauderer, M. G., Gucalp, A., Epstein, A. S., Norton, L., Seidman, A. D., … & Keesing, J. (2013). Beyond Jeopardy!: Harnessing IBM’s Watson to improve oncology decision making.
  14. Kris, M. G., Gucalp, A., Epstein, A. S., Seidman, A. D., Fu, J., Keesing, J., … & Setnes, M. (2015). Assessing the performance of Watson for oncology, a decision support system, using actual contemporary clinical cases.

486 Verbs account for 90% of occurrences


looking at citations’ meaningfulness  in Tahamtan, Iman, and Lutz Bornmann. “What Do Citation Counts Measure? An Updated Review of Studies on Citations in Scientific Documents Published between 2006 and 2018.” arXiv preprint arXiv:1906.04588 (2019).

Bertin and Atanassova (2014) showed that in the introduction section, “70 verbs account for 50% of all verb occurrences, and 486 verbs account for 90% of the occurrences”.

Lots and lots of insights and data here ….


note to self. – follow up and look at related work


A more serious study of the Public Git Archive (PGA)

Following up on the Octoverse clues, I uncovered this GEM — Markovtsev, Vadim, and Waren Long. “Public git archive: a big code dataset for all.” In Proceedings of the 15th International Conference on Mining Software Repositories, pp. 34-37. ACM, 2018. you can look at the arXiv version here.

This study point to the following being the most popular programming languages

  1. C
  2. JS
  3. C++ 
  4. Java
  5. PHP
  6.  Go
  7. Python
  8. Obj-C
  9. C#
  10. Ruby

If you’re into data mining and analysis of REALLY large public datasets, this one offers lots to work with. According to the authors, the Public Git Archive occupies 3.0 TB on disk .   Enjoy ..



GitHub’s Octoverse is really providing some serious insight about what’s hot and what’s not with developers. For example: in terms of top projects / contributors:

1 Microsoft/vscode 19K
2 facebook/react-native 10K
3 tensorflow/tensorflow 9.3K


Top Growing Languages:

1 Kotlin 2.6X
2 HCL 2.2X
3 TypeScript 1.9X

Lots more data there … I wish they showed more information, not just the top 10s …

Hagen’s Biological and clinical data integration in healthcare study is great!

Just finished looking at Matt Hagen’s 2014 “Biological and clinical data integration and its applications in healthcare.” PhD  dissertation. This is a great piece of work … You can find it here.

While its around 5 years old, the insights and discussion are excellent.  I like the detailed breakdown of how different ontologies and vocabularies align (and how things fall through the cracks).  I liked the discussion of using Neo4j to analyze relationships and simplify searches and relationship mappings.

Particularly liked the discussion of using  ontologies.  to” facilitate improved prioritization of intensive care admissions and accurate clustering of multimorbidity conditions”.  THIS IS BIG! with enormous potential.

Discussion of his BioSPIDA relational database translator and its contrast with  the separate Entrez Gene, Pubmed, CDD, Refseq, MMDB, and Biosystems NCBI databases.

His Table 7.2: Descriptions of patient clusters is rather illuminating, as his discussion and analysis of ICU Electronic Health Records and findings associated with morbidity outcomes.

For example Cluster 1 contains the following Most Prevalent Conditions: Coronary arteriosclerosis, Hypercholesterolemia, Diabetes, Gastroesophageal reflux disease,  Atrial fibrillation, Hyperlipidemia, Tobacco dependence.  Which led to the following Most Prevalent Procedures:  Catheterization of left heart, Cardiopulmonary bypass operation, Angiocardiography of left heart,.


 I  am surprised this work is not cited as much as it should be!.  IMHO, this work definitely should be used as blueprint for additional investigations.



Tithonus dilemma – longevity without health

“ Tithonus dilemma,” namely, the consequences of longevity without health and vigor. The dilemma plagues the project of keeping people alive indefinitely without their bodies and brains succumbing to age and cellular decay,

see discussion in Gods and Robots: Myths, Machines, and Ancient Dreams of Technology  by Adrienne Mayor.