Friday, November 16, 2007

Friend-of-a-Friend (FOAF) support in Strigi

Last week I have been hacking on a Strigi plugin for FOAF files. Now, one will not expect FOAF files on ones desktop soon... unless, you start indexing your Konqueror history. I have not seen that feature yet, but not overly difficult to implement for a skilled Konqueror developer (just use the dbus interface for Strigi).

However, HTML files may have this line in the <head> element:
<link href='http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf'
rel='meta' title='FOAF' type='application/rdf+xml'/>

This could be the trigger for a Strigi plugin, to download this file and provide that as substream for the HTML file. I am aware for security issues at immediately pop up, but that is something we can surely deal with.

Using this approach the whole semantic desktop takes shape. Say, I am searching what I have on my desktop on some topic, then, additionally, Strigi will make me aware that I recently read the blog from someone who showed interest in that topic too. Moreover, it will even allow Strigi to tell me which projects on SourceForge are related to this topic.

Far fetched? No, it's really just around the corner. If interested, you may find the source code in KDE SVN under trunk/playground/utils/strigi-foaf. The Konqueror history hack is not implemented yet, as I need to know first what efforts are ongoing in that respect.

Wednesday, August 22, 2007

MDL SD files as folders: opening a single molfile

Alexandrs Google Summer of Code project is over, and he is wrapping up his code and blogging about his resource. He just blogged about one of his achieved goals: opening SD file as folders using Strigi's jstream technology. It provides tight support of chemistry on the KDE desktop: browse a SD file as a folder, open a single MDL molfile entry from the SD file with the FileOpen dialog, all in addition to finding a specific molecule by InChI in the SD file. Check the screenshots that Alexandr put online.

Thursday, August 09, 2007

Strigi now understands Xesam queries

Flavio wrote about Strigi getting Xesam query support, which is cool and allows me to look up email from me to a friend using this query:
<request>
<query>
<and>
<equals>
<field name="email.to"/>
<string>Christoph</string>
</equals>
<equals>
<field name="email.from"/>
<string>Egon</string>
</equals>
</and>
</query>
</request>

So that, after having saved the above as query.xml, I can then issue a strigicmd call:
strigicmd xesamquery -t clucene -d index/ -q query.xml

Now, with proper nesting and all functional (see the full query language specification), I can do all sorts of nice queries:
<request>
<query>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.xml-cml.org/schema</string>
</equals>
</query>
</request>

Possibly embedded in XHTML (or vice versa):
<request>
<query>
<and>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.xml-cml.org/schema</string>
</equals>
<equals>
<field name="xml.usesNamespace"/>
<string caseSensitive="true">http://www.w3.org/1999/xhtml</string>
</equals>
</and>
</query>
</request>

And, after having installed strigi-chemical currently developed by Alexandr, the GSoC student on chemistry support for Strigi, chemical queries. For example, to get all molecules with a certain mass range, I can find all files that use a CML namespace:
<request>
<query>
<and>
<greaterThan>
<field name="chemistry.molecular_weight"/>
<float>50</float>
</greaterThan>
<lessThan>
<field name="chemistry.molecular_weight"/>
<float>59</float>
</lessThan>
</and>
</query>
</request>

Or, give me all chemical files which contains a molecule with 'butane' in the name:
<request>
<query>
<and>
<contains>
<field name="content.mime_type"/>
<string>chemistry</string>
</contains>
<contains>
<field name="chemistry.name"/>
<string>butane</string>
</contains>
</and>
</query>
</request>

Thursday, July 26, 2007

Small script for ignoring files in Subversion

Subversion takes a slightly different, and more general approach to ignoring files and directories. Unlike the .cvsignore with CVS, SVN uses its property mechanism. I just wrote a small script to help me add ignores:

#!/bin/bash

TMPFILE=.newcvsignore

rm -f ${TMPFILE}

# add whatever already is ignored
for i in `svn propget svn:ignore`; do
echo "Adding $i"
echo $i >> ${TMPFILE}
done

# add new things to ignore
for i in $*; do
echo "Adding $i"
echo $i >> ${TMPFILE}
done

svn propset svn:ignore -F ${TMPFILE} .

rm -f ${TMPFILE}

There might be an easier trick to do this, but now at least I have this nice one-liner:

svnignore somefile

Friday, June 01, 2007

The GSoC has started: towards a chemical desktop

This Thursday the GSoC students started to work, and Alexandr posted his plans for the chemical semantic desktop. Alexandr's work will be based on the Strigi/Nepomuk framework, and Liquidat just wrote a very nice status report about it.

Friday, April 27, 2007

GSoC Meeting with Alexandr

Yesterday I met with Alexandr to discuss things around his GSoC project, like time schedule etc. During the mentors meeting to work out the final rankings, one fellow mentor argued that this project is too specialized for KDE. We, therefore, discussed how we can maximize the effect on the rest of the KDE project, and ideas that came up include a dedicated query tool for complex data (such as chemical data). Anyway, this will be discussed in our blogs soon.



Meanwhile, I have registered to the new Planet SoC which was announced on the Summer of Code Blog.

Saturday, April 14, 2007

GSoC: towards a chemical semantic desktop

Now that I am officially a Google Summer of Code mentor for KDE's participation, it was more than time to get my KDE4 install up to date. Meanwhile, Jos' Strigi toolkit is well integrated already, and Jerome has updated the chemical kfile plugins to the new Strigi based architecture.

I was talking to Phreedom on IRC about ontologies used by Strigi, and added one for chemistry. It currently has the fields chemistry.inchi, chemistry.molecular_formula, chemistry.molecular_weight, chemistry.pdbid, and chemistry.xray_resolution, but more are expected to be added. I already updated kfile_chemical to make use of these fields, and updated it for a few fields from the more generic ontologies in Strigi.

Extracted metadata
Strigi currently focusses on metadata only, as do the kfile_chemical plugins: they extract metadata from the file, and do not generally create metadata based on the file (actually, Strigi calculates sha1 hashes). These are typically fields like molecular formula, title, X-Ray resolution (in case of PDB files), identifiers (e.g. InChI, PDB id), etc. However, there can be a lot more interesting information in those files, which require some more tought. For example, PDB files cite one or more publications, which might be present at ones hard disk too. The idea is here, that Strigi actually links the PDF with the publication and the PDB file. This is where Nepomuk comes in, and where Strigi is currently disabled. Similarly, any general organic chemistry publication will mention many molecules, each of which might have other publications discussing them, or even have 3D coordinates or other properties defined.

Created metadata
Another interesting thing one can do for chemical documents, is calculate metadata: for example, calculate InChI's for mol/xyz/hin/... files, using OpenBabel. Or Rule-of-Five properties, e.g. using the CDK. This is where the GSoC project comes in which I am mentoring, and on which Alexandr (a former CUBIC student) is going to work.

Oh, and like most desktop search tools, it can simply work on your HTML cache too, so that all these cool things will work on the webpages you search too. That should trigger some more ideas :) It does for me at least.

Saturday, April 07, 2007

A Chemical KDE desktop: the Google SoC

Two Google Summer of Code ideas have been written up for the KDE project, and students wrote 10 applications based on those. Today is an important day, as the final ranking will be determined which is send of to Google. See my bits on this some days ago, and earlier in this blog. Both ideas have a reasonable chance of getting one student accepted, but the final decisions will not be clear and made public before 11 April.

Thursday, March 08, 2007

Chemical KDE GSoC Project Ideas

Very likely KDE will participate again in the Google Summer of Code. At this moment two project ideas are written up related to chemistry: The second is my personal favorite, and I have worked on this topic in the past myself. See also Carsten's post on this.

The only thing I really miss in the full list of ideas, is the idea of completing a Greasemonkey plugin for Konqueror. I could not mentor a project like this, and, therefore, cannot put it up on the idea list :(

Friday, February 23, 2007

KDE GUI's for OpenBabel's chemical file conversion

Jerome Pansanel has written Qt and KDE based GUI's around OpenBabel, allowing file conversion of chemical documents. You can download it here.

Together with the chemical mime type package, and chemistry indexing support in strigi, this should make KDE a perfect desktop for handling molecular data. I won't be able to make FOSDEM once more, but looking forward to the transcripts.

Saturday, January 13, 2007

New: a Konqueror userscript plugin

Maybe neofreko read my blog item about userscripts for chemistry about a week ago, but he started working on userscript support in Konqueror. He seems to aim at Greasemonkey compatibility, and writes:
    Looking at the quality of my code, it's obvious that my C++/Qt/KDE skill is .. questionable :D. Thus, help from others would be definitely warmly accepted :)

I have not compiled it yet, but comments have been positive, but will do so soon to test the chemistry userscripts I wrote. Cheers, neofreko!

Sunday, January 07, 2007

Firefox userscripts for chemistry, new chemical mime types, OpenBabel builds and Eigen

I has been some time since I blogged on this channel, as I had little to mention lately. Last two weeks have been interesting, however. With 3D molecules in Kalzium (and other things) in mind, Eigen was developed, which saw its 1.0 release this week. Carsten put OpenBabel 2.1 in Novell's RPM build service, bringing important chemoinformatics algorithms to ones desktop. Chemical awareness of our desktop environments is further tightened now that Daniel made a new release of the chemical mime types package, and now also provides a webpage with known chemical mime types.

If you consider Firefox to be part of a desktop, then I hope to have thrown in some interesting developments too. In the last few weeks I worked on semantic markup of chemical compounds in web pages, and corresponding Greasemonkey and server side JavaScripts to handle this semantic markup. Both scripts automatically link to Google search and PubChem to increase the chemical integration of your desktop. Now, big questions: is there a Konqueror equivalent for userscripts? At least the server side version works.