486 Verbs account for 90% of occurrences


looking at citations’ meaningfulness  in Tahamtan, Iman, and Lutz Bornmann. “What Do Citation Counts Measure? An Updated Review of Studies on Citations in Scientific Documents Published between 2006 and 2018.” arXiv preprint arXiv:1906.04588 (2019).

Bertin and Atanassova (2014) showed that in the introduction section, “70 verbs account for 50% of all verb occurrences, and 486 verbs account for 90% of the occurrences”.

Lots and lots of insights and data here ….


note to self. – follow up and look at related work



A more serious study of the Public Git Archive (PGA)

Following up on the Octoverse clues, I uncovered this GEM — Markovtsev, Vadim, and Waren Long. “Public git archive: a big code dataset for all.” In Proceedings of the 15th International Conference on Mining Software Repositories, pp. 34-37. ACM, 2018. you can look at the arXiv version here.

This study point to the following being the most popular programming languages

  1. C
  2. JS
  3. C++ 
  4. Java
  5. PHP
  6.  Go
  7. Python
  8. Obj-C
  9. C#
  10. Ruby

If you’re into data mining and analysis of REALLY large public datasets, this one offers lots to work with. According to the authors, the Public Git Archive occupies 3.0 TB on disk .   Enjoy ..



GitHub’s Octoverse is really providing some serious insight about what’s hot and what’s not with developers. For example: in terms of top projects / contributors:

1 Microsoft/vscode 19K
2 facebook/react-native 10K
3 tensorflow/tensorflow 9.3K


Top Growing Languages:

1 Kotlin 2.6X
2 HCL 2.2X
3 TypeScript 1.9X

Lots more data there … I wish they showed more information, not just the top 10s …

Hagen’s Biological and clinical data integration in healthcare study is great!

Just finished looking at Matt Hagen’s 2014 “Biological and clinical data integration and its applications in healthcare.” PhD  dissertation. This is a great piece of work … You can find it here.

While its around 5 years old, the insights and discussion are excellent.  I like the detailed breakdown of how different ontologies and vocabularies align (and how things fall through the cracks).  I liked the discussion of using Neo4j to analyze relationships and simplify searches and relationship mappings.

Particularly liked the discussion of using  ontologies.  to” facilitate improved prioritization of intensive care admissions and accurate clustering of multimorbidity conditions”.  THIS IS BIG! with enormous potential.

Discussion of his BioSPIDA relational database translator and its contrast with  the separate Entrez Gene, Pubmed, CDD, Refseq, MMDB, and Biosystems NCBI databases.

His Table 7.2: Descriptions of patient clusters is rather illuminating, as his discussion and analysis of ICU Electronic Health Records and findings associated with morbidity outcomes.

For example Cluster 1 contains the following Most Prevalent Conditions: Coronary arteriosclerosis, Hypercholesterolemia, Diabetes, Gastroesophageal reflux disease,  Atrial fibrillation, Hyperlipidemia, Tobacco dependence.  Which led to the following Most Prevalent Procedures:  Catheterization of left heart, Cardiopulmonary bypass operation, Angiocardiography of left heart,.


 I  am surprised this work is not cited as much as it should be!.  IMHO, this work definitely should be used as blueprint for additional investigations.



Tithonus dilemma – longevity without health

“ Tithonus dilemma,” namely, the consequences of longevity without health and vigor. The dilemma plagues the project of keeping people alive indefinitely without their bodies and brains succumbing to age and cellular decay,

see discussion in Gods and Robots: Myths, Machines, and Ancient Dreams of Technology  by Adrienne Mayor.


Artificial Intelligence for Regenerative Medicine

Next on the research reading queue, pointers to applications of AI in Regenerative Medicine. We’ll be including this in discussions.


Principle texts:

Wikipedia: Regenerative medicine is a branch of translational research ] in tissue engineering and molecular biology which deals with the “process of replacing, engineering or regenerating human cells, tissues or organs to restore or establish normal function”.  This field holds the promise of engineering damaged tissues and organs by stimulating the body’s own repair mechanisms to functionally heal previously irreparable tissues or organs.
Regenerative medicine also includes the possibility of growing tissues and organs in the laboratory and implanting them when the body cannot heal itself. If a regenerated organ’s cells would be derived from the patient’s own tissue or cells, this would potentially solve the problem of the shortage of organs available for donation, and the problem of organ transplant rejection.
Some of the biomedical approaches within the field of regenerative medicine may involve the use of stem cells.[8] Examples include the injection of stem cells or progenitor cells obtained through directed differentiation (cell therapies); the induction of regeneration by biologically active molecules administered alone or as a secretion by infused cells (immunomodulation therapy); and transplantation of in vitro grown organs and tissues (tissue engineering). ]


along these lines, I encountered this interesting title:


Zhavoronkova, Anna A., Polina Mamoshinaa, Quentin Vanhaelena, Morten Scheibye-Knudsene, Alexey Moskalevf and Alex Alipera. “Artificial intelligence for aging and longevity research.” (2018).

Abstract: The applications of modern artificial intelligence (AI) algorithms within the field of aging research offer tre- mendous opportunities. Aging is an almost universal unifying feature possessed by all living organisms, tissues, and cells. Modern deep learning techniques used to develop age predictors offer new possibilities for formerly incompatible dynamic and static data types. AI biomarkers of aging enable a holistic view of biological processes and allow for novel methods for building causal models—extracting the most important features and identifying biological targets and mechanisms. Recent developments in generative adversarial networks (GANs) and re- inforcement learning (RL) permit the generation of diverse synthetic molecular and patient data, identification of novel biological targets, and generation of novel molecular compounds with desired properties and ger- oprotectors. These novel techniques can be combined into a unified, seamless end-to-end biomarker develop- ment, target identification, drug discovery and real world evidence pipeline that may help accelerate and im- prove pharmaceutical research and development practices

AI in medicine – not ready for prime time?

Am exploring what really can be said for AI in medicine.  There are lots of good things going on … but some reality seems to have set in.

I ran into this conclusion in the paper  Deep Learning for Genomics: A Concise Overview by Yue and Wang at Carnegie Mellon. [Yue, Tianwei and Haohan Wang. “Deep Learning for Genomics: A Concise Overview.” CoRR abs/1802.00810 (2018):]

Current applications, however, have not brought about a watershed revolution in genomic research. The predictive performances in most problems have not reach the expec- tation for real-world applications, neither have the interpretations of these abstruse models elucidate insightful knowledge. A plethora of new deep learning methods is constantly being proposed but awaits artful applications in genomics.

I was really hoping we were farther along. Maybe there’s hope … there’s always hope        [Elvis: Farther along we’ll know more about it. Farther along we’ll understand why. Cheer up my brother live in the sunshine].  Right now, what I am seeing with Watson for Genomics, and other ‘production systems ‘ suggest lots of work ahead.