On lucene in many langs: can't one write one's own analyzer? I see 9 with their own class, including cjk; cjk supposedly can work for indic langs too. It would be interesting to have a list of how well lucene works for us in each language (linked to 'how to build your own analyzer') Sj 05:43, 30 October 2006 (UTC)
-- To my knowledge, now CJK is not necessary to use a specific analyzer. Although Chinese characters in CJK usually have word segmentation issues, TF-IDF model of lucene has a natural behavior similar to n-gram, that is, characters close to each others are considered more related like a word. For example, my passage retrieval system uses a character based indexing with the StandardAnalyzer mainly and a word based indexing with WhitespaceAnalyzer for pre-segmented texts. This is still a good suggestion, however, since we don't know all languages. :) --b6s 14:10, 16 November 2006 (UTC)
Say what WWDC means. --Jidanni 2007-01-04
- I assume Worldwide Developers Conference. Angela 23:57, 3 January 2007 (UTC)
- Yes, Angela is right. Since WWDC sessions are developer-oriented, they are always lab-style sessions held by Apple staffs with API introduction.
Has this been talked about with the devs?
Has anyone talked about this plan with the developers? This is very far away from the initial idea of the hacking days.
- We had sent this plan to mailing-list and got exactly the same response to Brion's message on the planning page, 3 months ago; here's the thread: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/27576 . It's hard to say that but we need more practical suggestions. --b6s 06:49, 5 February 2007 (UTC)
- Shall we just plan for logistics and leave the event to MediaWiki developers themselves?--b6s 07:18, 5 February 2007 (UTC)