DjVu IFilter
The LizardTech DjVu IFilter is a tool that enables searching over DjVu document collections by keyword or phrase using Microsoft SharePoint or standard Microsoft Windows search functionality. In other words, it enables you to search over the "hidden text layer" that is found in most DjVu files.
If you are using Windows (NT 4.0, 2000 or XP), download and install our DjVu IFilter, and your system will instantly be able to search through the textual content of DjVu documents via the standard Windows search interface.
The DjVu IFilter also enables large repositories of DjVu files to be searched using applications based on the Microsoft Index Server. This includes Microsoft applications such Site Server and SharePoint Server.
Click here to read the DjVu IFilter Announcement
or click here to read the DjVu IFilter Release
Notes.
DjVu IFilter for Windows
LizardTech Introduces DjVu IFilter,
Enabling Keyword Search over DjVu document Collections using
Microsoft SharePoint or Windows Built-in Search Functionality.
Locating information deep inside DjVu® documents just got easier, thanks to the new DjVu IFilter introduced today by LizardTech, a worldwide leader in software solutions that make it significantly easier to manage, distribute, and access digital content such as aerial photography, satellite images and color image documents.
The DjVu technology alleviates most problems commonly associated with other document formats and provides a no-compromise approach to color document scanning and interchange. Highly compressed true-quality images optimized for Web viewing means quicker access to information without sacrificing quality, resolution, or legibility. The new DjVu IFilter provides consumers with an easy way to search DjVu document collections by keyword or phrase using Microsoft SharePoint, or standard Microsoft Windows search functionality to find precisely the information they are seeking.
The DjVu format is used by a number of hospitals, legal and financial institutions for archiving, accessing and exchanging complex color documents such as statements, articles, manuals, records, and catalogs. Realview Online Publishing System (ROPS), based in Sydney, Australia, provides an online publishing system and software that transforms newspapers, magazines and catalogs into stunning, high quality online publications exactly the way they were printed - stored in DjVu format.
"LizardTech's DjVu IFilter enables us to quickly search over large collections of publications stored in DjVu format," says Richard Lindley, CEO of Realview. "The indexing and search capability can be offered directly on our customer's web sites without the use of a database and is completely automated once installed. A flexible interface means that searches can be simple keywords or complex queries with vector weighted results."
"DjVu truly is the premier format for storing and exchanging scanned documents or complex electronic documents," said Luc Vincent, Vice President of Document Imaging at LizardTech. "Our DjVu IFilter now puts the text information contained within DjVu documents at your fingertips! End users and system administrators no longer need to set up their own keyword-search mechanism: the IFilter allows you to take advantage of the powerful search mechanisms already available in Windows and SharePoint, without any special configuration."
DjVu IFilter Release Notes
LizardTech™ DjVu® IFilter provides access to text in any DjVu document upon which optical character recognition (OCR) has been performed. IFilter supports Microsoft Windows, NT4.0, 2000, XP clients and servers that use the Microsoft Indexing Service. This means DjVu documents are included in queries by the Windows Search function, and repositories of DjVu files can be searched when using applications based on the Microsoft Index Server, such as the built-in Windows Indexing Service, Index Server, Site Server and SharePoint Server.
Following are system requirements and detailed instructions for installing DjVu IFilter on a range of servers including Index Server and SharePoint.
System Requirements
LizardTech DjVu IFilter 1.1 requires one of the following environments:
-
Microsoft Windows NT 4.0 Server with Service Pack 3 (or higher) and Option Pack 4.
-
Microsoft Windows 2000 Professional or Server.
-
Microsoft Windows XP Professional or Server.
Installation Instructions
The LizardTech DjVu IFilter is packaged in a self-extracting installer that must be downloaded and run on the machine where you wish to use it. Version 1.1 of this installer is called “DjVuIFilter11.exe”.
To install for use with the Windows file search feature,
simply run the installer on your machine. You will then be
able to go to Start > Search and search for words within DjVu
documents.
To use the IFilter in a server solution, follow the platform-specific instructions below.
Windows 2000 and XP
- Stop all appropriate clients by one or more of the following methods:
- Use the Indexing Service snap-in to the Computer Management
console. From the Action menu, choose
Stop.
- Use the Services snap-in to the Computer Management console.
In the Results pane, right-click on the service named Site
Server Search and choose Stop.
- For SharePoint: Use the Services snap-in to the Computer Management
console. In the Results pane, right-click on the service named
Microsoft Search and choose Stop.
-
Uninstall any previous (pre-release) version of LizardTech DjVu IFilter.
-
Double-click the installer program and follow the onscreen instructions.
-
If using the Windows Indexing Service or Index Server:
- Use the Index Server snap-in to the Microsoft Management
console. From the Action
menu, choose Start.
- Use the Index Server snap-in to the Microsoft Management
console. In order to index DjVu files in all catalogs,
right-click on Indexing Service
or right-click in a specific catalog. Select Properties
from the popup menu. Open the
Generation tab. Select the “Index
files with unknown extensions” checkbox.
If using Site Server:
- Use the Services snap-in to the Computer Management
console. In the Results pane, right-click on the service
named Site Server Search and
choose Start.
- Use the Search (Site Server) snap-in to the Computer
Management console (found at Start > Settings > ControlPanel
> AdministrativeTools > Administration > SiteServerServiceAdminMMC.
Open the appropriate catalog in the Scope pane and select
the Catalog Build Server
node. In the Scope pane, right-click on each virtual
directory desired and choose Properties.
Select the File Types
tab. If the "DjVu" file extension is not included in
the list of file types, click the Add
button. Follow the instructions to Add
File Type Extension: DjVu. Click OK
to close the dialog box. Note:
this step should not be necessary when reinstalling
a new version of the LizardTech DjVu IFilter. For Index
Server make sure your catalog includes files with unknown
extensions.
- Use the Services snap-in to the Computer Management
console. In the Results pane, right-click on the service
named Microsoft Search
and choose Start.
- Re-index your site with all appropriate clients by one or more of the following
methods:
- For Windows Indexing Service or Index Server: Use
the Indexing Service snap-in to the Computer Management
console. Open the appropriate catalog in the Scope pane
and select the Directory
node. In the Results pane, right-click on the virtual
directory which contains your DjVu files. Select the
All Tasks menu item and
choose Rescan (Full).
- For Site Server: Use the Search (Site Server) snap-in
to the Computer Management console. Open the appropriate
catalog in the Scope pane and select the
Catalog Build Server node. In the Scope pane,
right-click on the virtual directory which contains
your DjVu files. Select the All
Tasks menu item and choose Start
Build.
- For SharePoint: Open the snap-in from Start>Administrative
Tools>SharePoint Portal Server Administrator. On
the appropriate server, right-click on the workspace
which contains your DjVu files. Select the All
Tasks menu item and choose Start
Full Update.
NT 4.0
- Stop all appropriate clients by one or more of the following methods:
- Use the Index Server snap-in to the Microsoft Management
console. From the Action menu,
choose Stop.
- Use the Services Control Panel. In the dialog box,
select Site Server Search
and click the Stop button.
- Uninstall any previous (pre-release)
version of LizardTech DjVu IFilter.
- Double-click the installer
program file and follow the onscreen instructions.
- After the installation process finishes, start all appropriate clients with one or more of the following methods. For Site Server clients, there are two parts to the process and both are required.
If using Windows Indexing Service or Index Server:
- Use the Index Server snap-in to the Microsoft Management
console. From the Action
menu, choose Start.
- Use the Index Server snap-in to the Microsoft Management
console. In order to index DjVu files in all catalogs,
right-click on Indexing Service
or right-click in a specific catalog. Select
Properties from the popup menu. Open the Generation
tab. Select the “Index files with
unknown extensions” checkbox.
If using Site Server:
- Use the Services control panel. In the dialog box,
select Site Server Search
and click the Start button.
- Use the Search (Site Server) snap-in to the Microsoft
Management console. Open the appropriate catalog in
the Scope pane and select the Catalog
Build Server node. Right-click on each virtual
directory desired and choose Properties.
Select the File Types tab.
If the "DjVu" file extension is not included in the
list of file types, click the Add
button. Follow the instructions to Add
File Type Extension: DjVu. Click OK
to close the dialog box. Note:
this step should not be necessary when re-installing
a new version of the LizardTech DjVu IFilter.
- Re-index your site with all appropriate clients (with one or more of the following methods):
- For Windows Indexing Service or Index Server: Use
the Index Server snap-in to the Microsoft Management
console. Open the appropriate catalog in the Scope pane
and select the Directory
node. In the Results pane, right-click on the virtual
directory which contains your DjVu files. Select the
Rescan menu item and choose Full
Rescan.
- For Site Server: Use the Search (Site Server) snap-in
to the Microsoft Management console. Open the appropriate
catalog in the Scope pane and select the
Catalog Build Server node. In the results pane,
right-click on the virtual directory which contains
your DjVu files. Select the All
Tasks menu item and choose Start
Build.
Testing
You can check the progress of the indexing by selecting the Indexing Service under Computer Management. The service may take some time to index the files depending on server load. When the Docs to Index column has reached 0, all the files have been indexed.
To check the index, select the catalog you have just created, and click Query
the Catalog. Type in a word you know exists in one
of the files and the files containing that word should be
displayed.
If you have set this up as part of the base Index Server, you should also be able to go to Start>Search and search for a file with a word or a phrase in it.
Troubleshooting
If your search method does not find text in a DjVu file after you install the LizardTech DjVu IFilter:
-
Restart your server machine.
- Re-index your chosen directories. For example, using
the Index Server snap-in to the Microsoft Management console
(MMC), open the appropriate catalog in the Scope pane and
select the Directory node.
Then, in the Results pane, right-click on the virtual directory
which contains your DjVu files, select the Rescan
menu item and choose the Full Rescan
mode. If that does not work, stop and restart Index Server
using the Action menu of the
MMC. Should that fail to produce a good Index Server catalog,
disable indexing at the root of the directory using the
IIS snap-in to the MMC, stop and restart Index Server, and
reapply indexing at the root of the directory. It may even
be necessary to stop and restart Index Server once more
after that step. Additionally, you may use the
Merge Index button in the HTML Index Server Manager.
-
Verify that the DjVu file contains text. You can detect whether there is text in a DjVu file by opening it in the DjVu Browser Plugin. With the text selection tool, drag a rectangle around a region of the page containing characters. If you can not highlight any characters, then no optical character recognition (OCR) has been performed on the DjVu file and it contains no text for the LizardTech DjVu IFilter to index. You can perform OCR on your document using DjVu Editor (part of the Document Express Professional suite) or using DjVuJoin/DjVuBundle (part of the Document Express Enterprise suite).
-
Make sure the DjVu file has a ".DjVu" filename extension. Clients such as Index Server find the LizardTech DjVu IFilter by looking up the filename's extension in the Windows Registry and may not work with .djv.
-
For further help with Microsoft Index Server reference Microsoft Knowledge Base Article – 309173
Abstract Generation (Advanced)
It is possible to get an abstract from a DjVu file using Index Services. This is particularly helpful when using Microsoft SharePoint.
To get an abstract from a DjVu file:
-
Open Computer Management and Indexing Services.
- Select the catalog you created, right-click and select Properties.
- On the Generation tab, clear
(deselect) the "Inherit above settings
from Service" option, then select
Generate abstracts.
- You can now access the abstract if you use the Characterization
property in a search application. A simple .ASP example
is listed below:
To use this example, copy the text between the asterisk lines below ************* into a new text file and save into your WWWRoot as an ASP page (for example, searchdjvu.asp). Change QS.Catalog = "djvu" to match the name of the catalog you have created.
*************
<HTML>
<HEAD>
<TITLE>Search DjVu files using Index Server - Sample</TITLE>
</HEAD>
<BODY>
<%dim searchtext
searchtext=Request.QueryString("searchtext")
%>
Enter the term you want to search for and press 'Submit Query'
button:<br><br>
<form name="DjVuSearch" method="GET">
<input type="text" name="searchtext"
value="<%=searchtext%>"><br>
<input type="submit" name="SubmitButton">
</form>
<%
if searchtext<>"" then
dim QS,RS,I
SET QS = Server.CreateObject("ixsso.Query")
QS.Catalog = "djvu"
QS.Columns = "vpath, DocTitle, FileName, Path, Write,
Size, Characterization"
QS.Dialect = 2
QS.Query = searchtext
' Issue query
on error resume next
set RS = QS.CreateRecordSet("nonsequential")
if RS.RecordCount<>0 then
Response.Write("There are " & RS.RecordCount
& " matches for search string [ " & searchtext
& " ]
<br><br>")
for i = 1 to RS.RecordCount
Response.Write("<A HREF='file://"& RS("path")
& "' target='_blank'>" &RS("filename")
&
"</A><br>")
Response.Write("Summary " & RS("characterization")
& "<br><br>")
RS.MoveNext
next
else
Response.Write("There were no matches")
end if
end if
%></BODY>
</HTML>
************* Microsoft Developer Support
An IFilter is an ActiveX control called by a client to extract text from a given file format. The LizardTech DjVu IFilter consists of code that understands the DjVu file format along with code that provides the appropriate interface to clients such as Index Server. These clients use the text returned by IFilters to build indexes and support queries against those indexes.
To obtain more information about the IFilter specification, visit Microsoft's web site:
Click here.
LizardTech Technical Support Options for the LizardTech DjVu IFilter
Technical support for the LizardTech DjVu IFilter is available as a fee-based, pay-as-you-go option. For more information on pay-as-you-go technical support options and for a list of technical support phone numbers, please see http://www.lizardtech.com/support/
If you have Internet access, you have round-the-clock access to free technical information online. Visit the LizardTech Web site at http://www.lizardtech.com/support/doc/ to search our technical support databases, participate in user-to-user forums, or download free plug-ins, filters, or updates.
Language Support
LizardTech DjVu IFilter has no user interface and therefore is language-agnostic. It is tested and supported in Tier 1 languages (English, French, German, and Japanese), utilizing operating systems in the above languages and text within LizardTech DjVu documents in the above languages. The LizardTech DjVu IFilter is based on the Microsoft indexing client, which is responsible for interpreting the returned text and then presenting the information to the user.
Searching by Metadata
Not supported.
Known Issues
The Highlight Hits feature in Microsoft Index Server cannot highlight text in a DjVu file opened in the DjVu Browser Plugin.
Text in Indirect DjVu files is indexed twice: once as part of the indirect document and once again as a single page document.
Release Notes
1.1
Internationalized installer and localization to English and Japanese.
1.0
Initial release.
|