Thursday, August 09, 2007

Strigi now understands Xesam queries

Flavio wrote about Strigi getting Xesam query support, which is cool and allows me to look up email from me to a friend using this query:
<request>
<query>
<and>
<equals>
<field name="email.to"/>
<string>Christoph</string>
</equals>
<equals>
<field name="email.from"/>
<string>Egon</string>
</equals>
</and>
</query>
</request>

So that, after having saved the above as query.xml, I can then issue a strigicmd call:
strigicmd xesamquery -t clucene -d index/ -q query.xml

Now, with proper nesting and all functional (see the full query language specification), I can do all sorts of nice queries:
<request>
<query>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.xml-cml.org/schema</string>
</equals>
</query>
</request>

Possibly embedded in XHTML (or vice versa):
<request>
<query>
<and>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.xml-cml.org/schema</string>
</equals>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.w3.org/1999/xhtml</string>
</equals>
</and>
</query>
</request>

And, after having installed strigi-chemical currently developed by Alexandr, the GSoC student on chemistry support for Strigi, chemical queries. For example, to get all molecules with a certain mass range, I can find all files that use a CML namespace:
<request>
<query>
<and>
<greaterThan>
<field name="chemistry.molecular_weight"/>
<float>50</float>
</greaterThan>
<lessThan>
<field name="chemistry.molecular_weight"/>
<float>59</float>
</lessThan>
</and>
</query>
</request>

Or, give me all chemical files which contains a molecule with 'butane' in the name:
<request>
<query>
<and>
<contains>
<field name="content.mime_type"/>
<string>chemistry</string>
</contains>
<contains>
<field name="chemistry.name"/>
<string>butane</string>
</contains>
</and>
</query>
</request>

No comments: