This consumer’s perspective on the current state of the Semantic Web.

1 comment

I spent the last few days in Boston, attending the 4th, annual, Conference on Semantics in Healthcare And Life Sciences, (CSHALS) 2011. The ideas of the semantic web are really powerful – self describing data, a promise of ease in integration and reusability of data from any place and source in the world (wide web), a promise of a network of facts over which a machine can reason and infer implicit connections… Semantic Web IS the future. In fact, even in my own research I have tapped into the paradigms of the semantic web – I’ve attempted to represent the human genome as a network of components, which would allow me to make logical statements about the genomic context of the elements in a specific individual. (NOTE: genes, a subspecies of the genomic elements, contribute to less than 5% of the human genome! Our genome is destitute of genes, but overabundant in other elements that control when and how much of a specific combination of exons should be transcribed to produce a protein that is most useful to a cell at a point in time.)

Promises, promises…”  as it turned out to be and here is why:

  1. The software infrastructure had been very tricky to assemble to meet even the most basic needs of my project – despite the published specs of the software products
  2. The implementation of features promised in the standards lagged or the software vendors plainly refused to include basic functionality such as x>y comparisons !
  3. On top of everything, the performance issues stemming from the fact that semantic web software is still very immature (unlike the relational database software benefitting from the decades of optimizations…) rendered my product unusable when populated with just a fraction of the required data…
  4. I lost any hope of ever being able to use a reasoner to query my data, abandoned the purely-semantic web and went back to my trusty (and free) MySQL – without the ability to use the reasoner there was really very little incentive to continue my anguish described in three previous points.

The Semantic Web is the future, but unfortunately the future is still quiiiiiite distant, that was my mindset over a year ago… CSHALS 2011 became an awesome opportunity for me to update my minset, and learn about the cutting edge developments in the world of Semantic Web, where is the field heading, and whether my projects could return to the world of Semantic Web…

Indeed, I learned a lot. The conference started with a set of tutorials providing a hands-on guidance for newcomers and veterans of the semantic web. We converted some tabular data into RDF,  mashed-up several resources, and even analyzed some microarray data to find genes involved in Alzheimer’s Disease that were then integrated with Gene Ontology, and published using datapress plugin for wordpress, for a cool visualization, and all that BEFORE lunch!  The tutorials were a lot of fun but at the same they were indicative that the entry into the world of semantic web is still not  a smooth process for just anyone, the learning curve can be quite steep…

The following two days brought keynote speakers, tech demos, presentations, and, overall, some really interesting points that I will try to discuss in the following passages.

The very first (keynote) presentation of CSHALS by Toby Segaran, a professional involved in data management issues for years, turned out to be a depressing confirmation that all along I’ve wanted too much from the semantic web. To summarize his talk, semantic representation of data is great to represent modest amounts of data using the semantic relationships  (as opposed to fitting the data into more rigidly structured relational databases ). The coolest (and most needed in the field of life sciences) aspect of the semantic web: reasoning, and inference, Eric stated, are still somewhat of a “red herring”, just outside of our reach. And he knows this stuff. He is a data magnate dealing with software and web development for real-life data management issues, (now) working for “a search company in California.

Toby’s talk was immediately followed by Chris Baker’s, the man behind SADI, a revolutionary framework for semantic annotation of software services. Thanks to SADI, a computer can potentially suggest how to proceed with your data- suggest a set of tools that will give you the output you are looking for !!! To contradict Toby, Chris demonstrated the power of reasoners and inference. Specifically, Chris was able to correct scientists’ mistakes who had annotated chemical compounds inaccurately. His reasoner analyzed a very complex ontology of classes of chemical compounds, and axioms describing the relationships among these compounds. With some iterative ontology creation and validation behind, Chris showed the ability of a machine to catch human mistakes. Why this disagreement between Toby and Chris?  The problem is the size of the data. To be more relevant and useful, semantic web reasoner needs to be able to deal with HUGE quantities of complex data. Chris’ ontology was relatively tiny – only a few hundred compounds were considered and reasoned over. I would like to see a reasoner that can handle MUCH bigger datasets. I mean if we are integrating/mashing up datasets we need to be able to effectively deal with large quantities of data that become even more connected i.e. larger. A domain with just a few hundred entities is a very rare case in Life Sciences today. Especially now the scientists have all sort of cool tools to generate hundreds of millions of high-throughput genomic reads per experiment… Food for thought: just one human genome has 20,000 protein-coding genes (each locus contains a set of exons which can be transcribed and spliced into different transcripts, a process which is governed by multiple promoters and enhancers also somewhere on the human DNA) some 4,000,000 distinct repetitive elements, and an even greater number of  loci transcribed into non-protein-coding RNAs… Chris’ analysis caught 4 mistakes made by human in the annotation of just a few hundred chemical compounds. I shudder to think how many mistakes have been made in the current annotation of the entire human genome.

Another contradiction that I found interesting was that between Lawrence Hunter, working hands-on with ‘omic data, and Eric Neumann, a pioneer  and evangelist for the Semantic Web, who worked in many interest groups that continue to shape the direction of where the semantic web is heading. Lawrence brought up my earlier point about the increasing volumes of data. Soon there will be more than one human genome, as the sequencing cost is becoming so cheap that we are just this close to be able to sequence any genome for close to a 1000$ (down from the initial >1,000,000$ ). The problem is that our understanding of the human genome available today is VERY incomplete. We don’t even know the functions of all our 20,000 genes (which are less than 5% of one human genome)! We, the scientists simply NEED computers to help us sift through all the available data and be smart about the data that is already pouring into our labs (In the lab where I work we can fill up 20 Terabytes of space in a week with various sequencer data sets, this doubles when one tries to perform standard analyses such as genome mapping, annotations etc). I think the semantic web is crucial in facilitating “smartness” of our data management.

Eric made a point that i though was contrary to that of Lawrence and my own about the data volumes. Eric does not see/is not worried by the influx of data just yet, he thinks that the future (potential and distant) “data tsunami” will be conquered by data filtering, which would cleanse the data sets to only the parts that are manageable, “curated” and “interesting”, presumably discarding the rest.  This makes me think of the good old days (just a decade ago) when the human genome was made of interesting genes, and all the other “junk” that was masked out (i.e. discarded) from analyses by RepeatMasker tool…What if I would like to use the non-interesting parts of the data (i.e. the “junk” ? ) I mean, I have done so (sans the semantic web technologies despite my sincerest attempts), with some interesting insights about the junk!!! Or, a more generic example, what if I, a scientist want to use the semantic web, and reasoners to actually help me with the filtering of data? Reasoners seem to be ideal for catching contradictions and illogicalities!

In all fairness Eric brought up many other issues with the core of semantic web, stating that the in the current state the semantic web is unsustainable due to the inefficiencies of DNS queries to locate EVERY URI of EVERY data unit on the web… I thought Eric’s discussion of the ontology vocabulary being abstract and detached from the tangibility of items was also very interesting. In fact I would like to believe that our GELO ontology could be a solution that attaches reality to a concept in an ontology…

To make my long story shorter, my observations about the talks bring me to the following

Conclusions about the current state of the Semantic Web Technologies

  1. Software infrastructure assembly and maintenance requires a team of dedicated engineers:
    • At the current state of software, infrastructure, standards, etc, in the world of semantic web, only big companies such as Astra Zeneca or Novartis, or  small-companies-which-will-be-bought-by-a-search-company-in-California, are capable of hiring a specialized team of knowledge engineers to deal with just that: engineering knowledge, integrating data, full time. From my own personal experience it is a full time occupation: integration of data, keeping track of software updates, maintaining the semantic web software, tinkering with the semantic web software so it behaves. Unfortunately for someone who needs to do research using an integrated resource of many big data sets while maintaining the unwieldy semantic web aspects of the resource, it’s much more productive NOT to use the semantic web at all.
    • Long term benefits of RDF be damned! I’ll convert my relational database to semantic web only when
      • I find a reliable distributed, open source, preferably free (I am a scientist on a budget) triple store
        • which can load 20 billion triples +
        • AND which can provide a transitive closure to all of them
        • AND maybe provide some custom indexing capabilities for which xSQLs are known
      • OWL 3.1 or even 4.2 emerges and stabilizes enough for software developers to provide a FULL feature set implementations that support my uber triple store described above.
      • finally, when I find some evidence that the semantic web is helpful to me rather than sabotaging my ability to graduate in time, apply for a grant in time, deliver my query result this century (all of which I can accomplish using an old fashioned SQL)
    • This is a highly undesirable state, indicating semantic web deployment as a costly and thus, in case of many small labs, impractical luxury. I know of several labs that just recently moved from a spreadsheet approach to a relational database for data storage and management. They are amazed how much easier certain tasks have become in the lab and how much more flexible and accessible their data has become to them. The tools to create and manage a relational database are aplenty, and easy to install and maintain. On the other side of the spectrum, the semantic web stores are expensive, and/or provided as quagmires of java jar files that only the mightiest of hackers have the patience to assemble, set up, configure, program and potentially use (if everything works)… On the other hand, why should software developers create a usable set of tools, if the standards are still uncertain and they continue to change?
  2. The semantics (e.g. ontologies, and vocabularies) that are needed to describe the data are either non-existent or scattered throughout the world wide web, some very difficult to find.
    • It is easy to find information about a taxonomy of genes involved in a particular compartment of a cell or a chemical process. Paradoxically, trying to semantically represent a simple “round” shape of a lesion (that a doctor had written in a diagnosis of a patients tumor) becomes tricky, as linguists, philosophers and clinicians will all probably need to sit down first and establish what does it mean for a lesion to be a “round” in context of other lesions. Without a common and extensive vocabulary to be used as metadata, the Semantic Web technologies are crippled. The Semantic Web has proven useful as a solution to data integration problems, however if two (or more) independent vocabularies are created by groups unaware of each other, the same dataset will then potentially become represented in several different ways. How are 3 different semantic representations of the same dataset really different from having 3 different relational models representing the same ideas? Currently, a semantic middleware is created to integrate independent relational databases. Unless accessible and well advertised global metadata thesaurus is created, a conceptual middleware will have to be created to bridge different semantic interpretations of the same concepts.
  3. I saved the best for the last. There are some really cool cases for the use of reasoners out there.
    • The successful uses of reasoners seem to gravitate towards problems that are much easier to contain than the heavy-volume genomic data sets over which I’ve longed to reason.
    • SADI, which I mentioned earlier, can inferring a series of well-defined step required to produce an output from a given input data set. Any software developer can tap into the API and register their service for anyone to use!
    • WINGS on the other hand is a semantic workflow management that can provide a completely reproducible “methods” section of your publication (finally). What doesn’t WINGS do. The framework can optimize your workflow, split it into batches with granularity and precision dependent on how much time is allowed for an analysis for example! And it plays nicely with GenePattern!

The current drive of the development of semantic web technologies seems to be geared towards lowering web development costs, and deploying flashy, gimmicky web applications. There are things on the horizon (BigCouch, Triple Map, Knowledge Explorer, BEL and Genestruct framework, Gruff) that may make semantic web slightly less of a luxurious nuisance and more of an actual aid in the day-to-day tasks of a sample scientist. I’m not too optimistic about the scalability/affordability of these products that would make them usable for my own tasks within a year, however the next decade should definitely render Semantic Web a necessity ;-).

Looking forward to CSHALS 2012!!!

There, I’ve said it !




about Come2me

no comments

A new musical idea sometimes come very unexpectedly… This time I was working on my thesis and needed a little break, so I sat down at my piano… and the rest just happened.

Overall, the theme is a little bluesy and maybe a touch sinister… I guess this is the state of mind I’ve been in lately, i.e. in these last few weeks before defense… After much deliberation about the title I’ve agreed (with myself) on come2me…

It took about 4 days from the spark in my head to the “completed” Wave on my computer…

Check out bandcamp to download Come2Me.

Restaurant week #1

1 comment

OK, this has nothing to do with the Restaurant Week that the city of New Haven has been organizing recently. This is all about a kitchen in shambles (remodeling, utter destruction) and consequent inability to make my own food (aside from some basic microwaving of leftovers which are long since gone…)

For the last few days then I have been sampling the local restaurants and fast food joints scouting for sustenance…

I have to say, overall, there is NOTHING better than a breakfast steak with two eggs, potatoes, and heavily buttered rye bread at 7pm (courtesy of a local diner Hamden Town House but probably a staple of every diner Ive ever been too). Surprisingly, a turkey melt sandwich with steamed and broiled broccoli piled on top of the turkey between the two pieces of bread is also sensational. I highly encourage whomever is reading this blog to visit (for breakfast  only) the Brownstone House in Hamden (which has nothing to do with a gay bar in Waterbury), right in front of the refurbished Town Hall…

Steak, Fries and Broccoli

Delicious Food

Right across the street from the Brownstone House there is a restaurant: Mickey’s. If you were ever asked whether your soup should come in a bowl or a cup, prepare yourself for an experience of a lifetime, their soups come in gigantic troths! Which is all the better considering that both clam chowder and black bean soup were home-heartedly amazing, and i just love soups. That was not the end though. That night’s special was hanger steak with broccoli rabe and fries. This was by far the best restaurant meal I’ve head in ages. The steak was seared to medium-well perfection, the broccoli was … well … rabe, and the fries… crispy, battered in something delicious, and without the bitter-oil aftertaste.

I sampled some gnocchi too – unbelievable.

Once or twice (or maybe I’ve lost count) I’ve sampled the offerings of the The Golden Star, can never go wrong with General Tso’s Delight featuring jumbo shrimp and chicken pieces battered in some sweet sauce, I don’t ask… I indulge but have problems sleeping afterwards… One plaza over there is Midori. Korean/Japanese restaurant the size of a typical hole-in-the-wall izakaya I used to frequent while living in Japan. Extremely nice atmosphere which reminded me of my Host Mom’s own bar in Hirakatashi… The sushi was very well done, and bibimbop was quite tasty too.

Now for the not so good.

The first major disappointment was Taco Bell on Dixwell. If thinking outside of the bun means pita wrap filled with salt and lard (or something even more greasy, gooey and artificially tasting) then I’ll take the bun, thank you very much. The two sole chunks of chicken in the burrito were so unbelievably processed that I had a really strange aftertaste for two days ! This was my first and last visit to TB (no, not you ThaiBinh). In fact i still twitch thinking how awful the flavor was… For authentic and not much more expensive Mexican food I’ll stick to Mezcal in New Haven (awesome margaritas too)

I love sea food, as it was getting late one day, somewhat out of desperation I decided to look into Red Lobster in North Haven. Very nice server lady unveiled in front of me and Damon a wonderful world of wood-burning grill which was apparently in operation, churning out fabulous meals for the surf (me) or turf (Damon) lovers! Drooling at the prospects I ordered some grilled shrimp in garlic sauce… A few minutes later a complete disappointment arrived on my plate as my microwave oven makes shrimp that are less bland. That was the driest (no sauce), blandest ripoff of a meal I’ve experienced. There were two skewers of shrimp, a dessicated pile of rice and, thank God, I ordered a side dish of a potato – that was probably the most flavorful part of that meal. Damon’s meal was equally disappointing – a chicken breast with the sauce poured after the “cooking process”! One could argue that a sea food joint does not have to do well the non-seafood things… One could also argue that it is REALLY difficult to screw up something as simple as chicken (and yet it did happen)… And, of course the ugly matter of tasteless, dry and yucky shrimp in a “sea-food restaurant”…

Across from there – The Olive Garden. When you come you’re family (as the commercial says, and quite true indeed — aside from very stingy on water refills, the service was pretty friendly… almost too friendly), when you leave, however you’re bloated and sick! I know i was. Even Gold Star can never get me to such stage. I ordered some soup: gnocchi and chicken… with every spoon-full my arteries were clogged by the cream in the soup but that was still nothing comparing to the outcry of my sodium channels hopelessly loosing the battle to the deluge of ions… Salty, Greasy… and thats before i realized that the gnocchi was overcooked… The memories of Gnocchi from Mickeys a few days earlier were just too strong to be satisfied by this poor excuse for Italian food…

Oh, and how does one make bland breadsticks taste “good”?  Well – Olive Garden seem to have perfected the art of adding flavor to breadsticks: baste the poor bread with salt and oil blend… A slice of fluffy, flavourful Italian bread was nowhere  to be seen…

Oddly enough, there was a LOT of basil everywhere. Clearly if there is basil and oregano then the meal must be Italian. The appetizer bread had TONS of basil on it (I suspect there was no flat bread underneath all the basil… I couldn’t verify as i was digging, and digging, and have never actually gotten through the layers of basil), the spaghetti sauce i ordered had tons of basil in it, the meatballs contained a lot of basil… why mask/overpower all the flavors with basil? are you hiding something, Olive Garden, wait was the only flavor in everything basil? Let’s just say that i will not be coming back to OG. Brazzi’s in New haven had some really awesome Spaghetti with meatballs!!! IKEA’s meatballs (part meat, part particle board) were better than OG’s !!!


that’s that…

It will probably be another week or so before I can make my own food, so… there may be more mis-adventures of shpakOO scavenging for food…

4 years ago…

1 comment

…shpakOO spilled chocolate all over himself at Capuccino’s

making a great first impression… that ended up lasting for 4 years and counting 🙂

… and now you know the news, of this Jan 20th…

thanks misiu 🙂


no comments

I have been feeling rather creative lately… I think all the stress comes out this way …
No matter, I’m really happy with the way STILL, the radio edit version, came out… First of all it is anything BUT still :).

More: Read the rest of this entry…