Wednesday, August 22, 2007

MDL SD files as folders: opening a single molfile

Alexandrs Google Summer of Code project is over, and he is wrapping up his code and blogging about his resource. He just blogged about one of his achieved goals: opening SD file as folders using Strigi's jstream technology. It provides tight support of chemistry on the KDE desktop: browse a SD file as a folder, open a single MDL molfile entry from the SD file with the FileOpen dialog, all in addition to finding a specific molecule by InChI in the SD file. Check the screenshots that Alexandr put online.

Thursday, August 09, 2007

Strigi now understands Xesam queries

Flavio wrote about Strigi getting Xesam query support, which is cool and allows me to look up email from me to a friend using this query:
<request>
<query>
<and>
<equals>
<field name="email.to"/>
<string>Christoph</string>
</equals>
<equals>
<field name="email.from"/>
<string>Egon</string>
</equals>
</and>
</query>
</request>

So that, after having saved the above as query.xml, I can then issue a strigicmd call:
strigicmd xesamquery -t clucene -d index/ -q query.xml

Now, with proper nesting and all functional (see the full query language specification), I can do all sorts of nice queries:
<request>
<query>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.xml-cml.org/schema</string>
</equals>
</query>
</request>

Possibly embedded in XHTML (or vice versa):
<request>
<query>
<and>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.xml-cml.org/schema</string>
</equals>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.w3.org/1999/xhtml</string>
</equals>
</and>
</query>
</request>

And, after having installed strigi-chemical currently developed by Alexandr, the GSoC student on chemistry support for Strigi, chemical queries. For example, to get all molecules with a certain mass range, I can find all files that use a CML namespace:
<request>
<query>
<and>
<greaterThan>
<field name="chemistry.molecular_weight"/>
<float>50</float>
</greaterThan>
<lessThan>
<field name="chemistry.molecular_weight"/>
<float>59</float>
</lessThan>
</and>
</query>
</request>

Or, give me all chemical files which contains a molecule with 'butane' in the name:
<request>
<query>
<and>
<contains>
<field name="content.mime_type"/>
<string>chemistry</string>
</contains>
<contains>
<field name="chemistry.name"/>
<string>butane</string>
</contains>
</and>
</query>
</request>