# NAME

UNB's Luci - The eLUCIdator

# INSTALL

## REQUIREMENTS

The following Perl modules are required by Luci. All are available from http://www.cpan.org/.

NOTE 1: The versions listed here are those that were used during development. It is possible that Luci will work with an older version of the same module, but when using an older module perfomance should then be considered uncertain.

NOTE 2: It is possible that some of the modules listed here have module requirements of their own.

## SETUP & CONFIGURATION

1. Un-archive the distribution
2. Un-tar the archive to a directory from which you wish to run Luci. You should consider installing Luci behind SSL. ( see the FEATURE LIST for information on how Luci works with SSL and why Luci should reside behind a Secure Socket Layer. )

3. Modify the config
4. Using your favorite editor, open the luci.conf.cgi and update the following parameters in accordance with your environment:

NOTE: These settings are required for Luci to work properly. Any parameter within the config file may be modified to tweak Luci to best suit your needs, but you only need to change the required list to get Luci up and running.

- app_root - absolute path under which you've installed the distribution.

        ex: /web/text_only/parser

- app_root_url - absolute URL of the application root directory. ( NOTE: the 's' in http's' is recommended, yet not necessary. Please see the FEATURE LIST for information on how Luci works with SSL and why Luci should reside behind a Secure Socket Layer. )

        ex: https://www.yourdomain.com/text_only/parser

Under <system>

- default_target - absolute URL of the default site where Luci should be directed. This site will be used if a user references the luci.cgi directly.

	ex: http://www.yourdomain.com/

- apache_2 - set true if running under apache 2, false otherwise

- allow_url - urls that Luci will treat as internal ( ie. allowed for parse ). This directive should be repeated for every url under which Luci should allow for parsing. If you leave this field empty, Luci will allow parsing of any domain. See luci.conf.cgi for specific info on how to set this directive.

	ex: allow_url = http://yourdomain.com/*
allow_url = http://yourotherdomain.com/*

- deny_url - urls that Luci will not allow for parse. This directive follows the same conventions as does the allow_url directive. Note: deny_url will override settings in the allow_url directive. If you leave this field empty, Luci will allow parsing of any url restricted by those set with allow_url. See luci.conf.cgi for specific info on how to set this directive.

	ex: deny_url = http://yourdomain.com/personal/*
deny_url = http://yourdomain.com/personal/mypage.html

- cryptkey - the encryption key is used with CBC::Crypt for generating parameter names, but also used for en/decrypting passwords used in conjunction with 401 authorization. ( ie. .htaccess ) If you are not familiar with this, set it to some random string. cryptkey can be anything you like - I think - don't be too creative, but don't use the example given here. ( see the FEATURE LIST for information on how 401 authorization works with Luci and why you need to set cryptkey. )

        ex: cryptkey = aZ4eg3P

- domain - the domain under which you've installed Luci.

        ex: yourdomain.com

- secure - Set to 1 if Luci is running under SSL. Used with cookies set by Luci. According to CGI::Cookie: If the 'secure' attribute is set, the cookie will only be sent to your script if the CGI request is occurring on a secure channel, such as SSL.''

You should now be ready to test the install. See USAGE.

5. Hiding the config ( optional - but highly recommended )
6. The configuration file ( luci.conf.cgi ) contains some information which you may not want publicized; namely the cryptkey parameter. ( see 401 authorization in the FEATURE LIST for more information on cryptkey and why it should be hidden ) The following methods may be used to hide the config. ( This is by no means a complete list, so please feel free to send us any methods you feel should exist with this list, along with the OS used )

- hide luci.conf.cgi from your web server document root - By moving the config file to a directory outside the web tree, anonymous users will not be able to access your configuration details. To do this, you need to first move luci.conf.cgi to some directory outsie of your web server document root, then edit the luci.cgi and the index.cgi, and update the following line:

use constant LUCI_CONF => $path.$sep.'luci.conf.cgi';

use constant LUCI_CONF => "/path/to/hidden/luci.conf.cgi";

- run setuid - By running Luci setuid, you can probably leave the config under the web server document root, as long as the permissions are set properly.

NOTE: When running setuid, Luci runs as the owner of the Luci scripts, and therefore has those permissions available to that user on the system. Please see the setuid man page for more information on how it works. When running setuid you should probably run the application with a user account that has minimal permissions on the system.

NOTE: As of version 1.3, we have renamed the configuration file with the appended '.cgi'. This should allow you to leave the file in place if you serve perl generated pages with the cgi extension. In this case it is recommended that you set it with read only permissions so any user attempting to read the file with their browser will get a forbidden error.

7. Hiding Luci from web crawler robots ( optional - but highly recommended )
8. In release 1.2 and earlier, we encountered an issue where web crawling engines that found an instance of Luci would spawn a large number of invocations attempting to spider websites via this service. In an environment where Luci is hosted without any url restriction, this was a real problem where the crawler would saturate the host, using all available resources, in turn denying a response to any other request. The issue is that crawling engines recognize sites via Luci as non-existent in their datastore, and so begin indexing, even though they should be considered a duplicate. This should be a non-issue for smaller sites that use the allow_url directive in the configuration file, but for larger sites, or those that do not make use of the allow_url directive, this can be a problem. For example, the googlebot engine is not only fast, but also runs in a distributed environment, and if it were allowed the availablity to spider the internet via your Luci instance, it could quite quickly tap all resources, in turn bringing your server to a grinding halt.

The solution has two parts:

- a - As per the googlebot documentation, we've added the meta content necessary to hide your Luci pages from a crawler that may attempt indexing via your site. This is available as an option in the config, and is by default turned on. If you have a small website, and do make use of the allow_url directive in your luci.conf.cgi, you shouldn't be too concerned about robots because they will only index those pages allowed by your Luci installation.

- b - For larger sites, or those allowing open access by not using the allow_url directive, it would be good practice to at minimum leave the option to include the nofollow meta turned on as is described in - a - above. Additionally, you can include a robots.txt file at the root of your site which is the standard method used by a website admin to communicate with a crawler engine.

As an example: if you install Luci under https://www.yourdomain.com/text_only/parser/, then you would create a robots.txt file at https://www.yourdomain.com/robots.txt. The entries necessary to hide this installation are as follows:

        User-agent: *
Disallow: /text_only/parser

NOTE: The robots.txt file *MUST* exist at the root of your site, else it will have no effect.

For information on robot exclusion via robots.txt and the meta nofollow, see: http://www.robotstxt.org/wc/exclusion.html

For detailed information on Web Robots, see: http://www.robotstxt.org/wc/faq.html

## PLATFORM SPECIFIC

Below is a list of environments under which we know Luci will run. With each is a description of what software versions were used, and what was required to get Luci running. Some systems have special requirements:

- UNIX/Linux/OS X - Luci was developed in a UNIX environment, and should work without issue. See BUGS for reporting issues found when running/installing Luci in a UNIX environment.

- Windows 2003 Server Version 5.2.3790 Build 3790 : Service Pack 1:: Processor - x86 Family 6 Model 6 Stepping 5 GenuineIntel ~500 Mhz -

1. Perl
2.         - Install ActivePerl (we used 5.8.8.817-MSWin32-x86-257965)
- top level dir: c:\Perl
- install all options
- Luci requires the 'Add Perl to the PATH environment
variable' option
- Using the perl package manager:
- install config-general
- install html-template
- install crypt-cbc
- install http://theoryx5.uwinnipeg.ca/ppms/Crypt-Twofish_PP.ppd
- install http://theoryx5.uwinnipeg.ca/ppms/Crypt-SSLeay.ppd
- Fetch ssleay32.dll? yes
- Where should ssleay3.dll be placed? C:\Perl\bin
- Fetch libeay32.dll? yes
- Where should libeay32.dll be placed? C:\Perl\bin
- see item 4 below for information on permissions
required by these two files
3. IIS
4.         - version 6.0 comes with this version of Windows 2003
- configure ISS 6 to support ActivePerl as per the following document:
	ActivePerl 5.8 - Online Docs : Web Server Configuration

        - note the following when corresponding with the configuration
document above
- New > Virtual Directory settings:
- Virtual Directory Alias: 'cgi-bin'
- Web Site Content Directory: c:\path\to\cgi-bin
(ex: C:\Inetpub\cgi-bin)
- Virutal Directory Access Permissions: check Read and Execute
- Virtual Directory > Properties
- Virtual Directory Tab > Configuration > Mappings:
- Executable: c:\Perl\bin\perl.exe -T "%s" %s
- Extension: .cgi
                - Documents Tab:
- check Enable default content page
- Default content page: index.cgi
- luci uses taint checking.  you can either remove the -T
from the perl invocation at the top of both index.cgi and luci.cgi,
or when you add the mapping, specify -T before "%s" %s
ex: C:\Perl\bin\perl.exe -T "%s" %s
5. Luci
6.         - extract (install) luci in your cgi-bin
- rename the folder 'luci'
- setup your luci.conf.cgi as described in SETUP & CONFIGURATION
- test your install as described in USAGE
7. Possible errors and solutions
8.         - LWP will support https URLs if the Crypt::SSLeay module is installed.
- 501 Protocol scheme 'https' is not supported (Crypt::SSLeay not
installed).
it is required that the IIS internet guest account (iusr) has
'Read and Execute' permissions set on the Crypt-SSLeay dlls,
namely, ssleay32.dll and libeay32.dll.  to set permissions,
locate these files, choose properties from the context menu, and
under the security tab, add the iuser account to the user names menu
with 'Read and Execute' permissions checked.
9. Other
10.         - the following document was provided by John Newman from
http://www.newluna.com/, and may be of assistance for those
installing under Windows 2003:
http://luci.sourceforge.net/other/win2003_05_30_2006.pdf
if you intend to perform a non-network install, it will be necessary
that you retrieve the modules and port them to your local machine
manually.  we cannot provide them here, and would also prefer that
the latest version of the modules be used.

- Windows 2000 Server Version 5.0.2195 Build 2195 :: Processor - x86 Family 6 Model 6 Stepping 5 GenuineIntel ~500 Mhz -

1. Perl
2.         - Install ActivePerl (we used 5.8.6.811-MSWin32-x86-122208)
- top level dir: c:\Perl
- y to all options (especially env var)
- Using the perl package manager:
- install config-general
- install html-template
- install crypt-cbc
- install http://theoryx5.uwinnipeg.ca/ppms/Crypt-Twofish_PP.ppd
- install http://theoryx5.uwinnipeg.ca/ppms/Crypt-SSLeay.ppd
- Fetch ssleay32.dll? yes
- Where should ssleay3.dll be placed? C:\Perl\bin
- Fetch libeay32.dll? yes
- Where should libeay32.dll be placed? C:\Perl\bin
- see item 4 below for information on permissions
required by these two files
3. IIS
4.         - version 5.0 comes with this version of Windows 2000
- configure ISS 5 to support ActivePerl as per the following document:
	ActivePerl 5.8 - Online Docs : Web Server Configuration

        - note the following when corresponding with the configuration
document above
- New > Virtual Directory settings:
- Virtual Directory Alias: 'cgi-bin'
- Web Site Content Directory: c:\path\to\cgi-bin
(ex: C:\Inetpub\cgi-bin)
- Virutal Directory Access Permissions: check Read and Execute
- Virtual Directory > Properties
- Virtual Directory Tab > Configuration > App Mappings:
- Executable: c:\Perl\bin\perl.exe -T "%s" %s
- Extension: .cgi
                - Documents Tab:
- check Enable default content page
- Default content page: index.cgi
- luci uses taint checking.  you can either remove the -T
from the perl invocation at the top of both index.cgi and luci.cgi,
or when you add the mapping, specify -T before "%s" %s
ex: C:\Perl\bin\perl.exe -T "%s" %s
5. Luci
6.         - extract (install) luci in the web tree
- rename the folder 'luci'
- setup your luci.conf.cgi as described in SETUP & CONFIGURATION
- test your install as described in USAGE
7. Possible errors and solutions
8.         - we've yet to encounter any installation errors while following the
directions provided here.  If any are found, please let us know so
we can update the documentation.

## USAGE

- test configuration - Once you have Luci installed and configured to work on your server, using Luci is quite simple. To test your configuration, simply point your web browser at the Luci install directory. You should see a 'text only' version of the default_target which was set in the configuration file.

- adding accessibility links to your pages - Using the provided index.cgi you can quite easily add accessibility links to all your pages. Use the following in your HTML to provide quick and easy access to Luci. The index.cgi will take care of forcing Luci to parse the appropriate page.

<a href="https://www.yourdomain.com/path/to/luci/">Accessible Version</a>

The following icons are also available for linking to Luci: ( download them from here or simply refrence them via the images directory that came with the distribution )

<img src="http://www.yourdomain.com/path/to/luci/images/luci_blue.gif" border="0" width="30" height="30" alt="luci" id="luci">
<img src="http://www.yourdomain.com/path/to/luci/images/luci_white.gif" border="0" width="30" height="30" alt="luci" id="luci">
<img src="http://www.yourdomain.com/path/to/luci/images/luci_black16.gif" border="0" width="16" height="16" alt="luci" id="luci">
<img src="http://www.yourdomain.com/path/to/luci/images/luci_black20.gif" border="0" width="20" height="20" alt="luci" id="luci">

# DESCRIPTION

## What ( or who ) is Luci?

Luci is the University of New Brunswick, Canada's enterprise website accessibility solution.

Luci is the bright young, colonial cousin of the venerable dowager Betsie, BBCs Education Text to Speech Internet Enhancer. While still bearing a family resemblance to Betsie, Luci has been completely re-written mainly to accommodate SSL. ( see SSL in the FEATURE LIST for more information on why we wrote Luci )

Luci is clear, plain, simple and easy to use.

## What does she do?

Luci allows you to change the way your browser views web pages by simplifying their content into a well-structured, text-only format, mainly for accessibility purposes. Luci works equally well whether you wish to change the font-size and colour scheme of your dislpay to make it easier to read, or want to send it to a text-reader.

Once in Luci's unified text rendering view, the user has several options for adjusting the application's display settings ( by choosing font, colour scheme, font size, and line height ). These settings are maintained as you continue to browse within the alloted domain(s), or or until you switch back to a Graphical Version of the site.

## What's up with the name?

Unlike Betsie, Luci is not an acronym. Luci gets her name from the word 'elucidate'.

Elucidator ( noun )

        1. One who explains or elucidates;

Elucidate ( verb )

        2. To make clear or plain; clarify.

Clear, plain, simple and easy to use - Elucidator is a no-brainer.

# FEATURE LIST

Luci has many features, those of which we've deemed most important are listed here. For more detailed information on what Luci can do, see the source, the config, and you may also find some detail in the changelog.

• Secure Socket Layer ( SSL - httpS )
• - Why SSL? - At our institution, a very important component of our core services are made available to our clients ( namely students ) via secure web. For us to make these services accessible it was required that we have a parser that could accommodate SSL.

At the time of this writing, Betsie was not capable of parsing secure content. An attempt was made to modify Betsie yet after much ado, a decision was made that we could benefit from a full re-write, in that we would add the ability to parse encrypted content, and would take advantage of any features that may not have been available/feasible at the time when Betsie was originally written. ( ex: OO concepts, certain Perl Modules, etc... )

- Luci should reside behind SSL - In Figures 1 and 2, you can see how Luci works. Basically, Luci acts as an intermediary between the user and some web site. In both diagrams, you can easily see that Luci may communicate with either secure or non-secure web servers.

user <------ http -----> Luci <----- http(s) -----> web server
Figure 1: http://yourdomain.com/path/to/luci/

user <------ httpS -----> Luci <----- http(s) -----> web server
Figure 2: httpS://yourdomain.com/path/to/luci/

In Figure 1, under http ( non-SSL ), the communication between the user and Luci is not encrypted, and therefore, not secure. This can be dangerous because those users accessing secure content using Luci in this scenario would be transferring data intended for encryption across an un-encrypted connection.

In Figure 2, Luci is hosted in a secure environment, and therefore, communication between a user and the web server is now encrypted. This is how Luci should be hosted.

Because of these security concerns, we've added a check in Luci that will fail parsing of secure pages when Luci is hosted from http. In this case, users will be presented with an error message when attempting to parse a secure document. See DISCLAIMER.

• Frames and Frames Page Refresh
• When navigating a framed page, Luci will maintain the frameset, emulating what a user sees ( framewise ) when browsing with a non-text-only client. We've also added a feature that will force a frameset reload if a user changes their style settings. ( javascript required for this feature to work )

• Luci not only stores all cookies that the user comes across in one single cookie, but also stores any information pertaining to the creation of those cookies within that cookie. We also have our own auth scheme which requires setting a cookie to maintain authorization. ( see 401 auth below )

- size and excessive use -

The cookie specification as per RFC2109 ( see: http://rfc.net/rfc2109.html ) states that a user agent must at LEAST accommodate for a cookie of 4 kb in size. ( quashing the myth that browsers only support cookies 4kb in size ) ( see: http://search.cpan.org/~gaas/libwww-perl-5.800/lib/HTTP/Cookies.pm#METHODS for details on cookie creation ) As a result, the application should work very well where cookie sizes are moderate. It is up to the browser how it will perform once the cookie size surpasses the required 4 kb limit. ( truncation a definite possibility ) Therefore, application performance with respect to cookie size and excessive use is uncertain, and most likely dependent on the users browser specification.

• 401 Authorization
• When a user navigates a site that requires 401 authorization ( ex: .htaccess ), Luci will provide them with our own homegrown login screen. Once the user passes authorization, the credential information is encrypted, and stored in an authorization cookie.

Using this model, you cannot be logged into more than one 401 site at a time. Luci passes the credential information to the server upon each and every 401 request. If the user navigates to a separate domain, they will be prompted for their credentials with respect to that domain, and any existing authorization data will be overwritten.

The cryptkey parameter in the luci.conf.cgi is the key that is used to en/decrypt the credential information that is stored in the authorization cookie. As stated, you should change the cryptkey and hide the luci.conf.cgi from being viewed via the web. If your cryptkey is publicly available ( ie. if you use the one provided with the download ), and some intruder manages to steal the authorization cookie from one of your users, they could then easily decrypt the data and obtain their credential information.

• Customizable
• - luci.conf.cgi -

Using the Luci configuration file, you can easily tweak Luci to best suit your sites needs.

- templates -

All the HTML associated with Luci is templated, and availabe in the templates directory provided with the distribution. With these templates, you can quite easily port your site branding to Luci, etc...

• Content Ignore
• If you want Luci to ignore specific content of a page, you can make use of the luciignore span tag within your html. For example:
 <span class="luciignore">
Luci will ignore any content nested within this tag block
</span>

# ACKNOWLEDGEMENTS

Many thanks to the authors of Betsie ( see: http://betsie.sourceforge.net/ ) for it was that application that inspired the creation of Luci.

We would also like to thank the following for their contributions on this project:

# BUGS

See BUG tracker for information and updates.

The University of New Brunswick: http://www.unb.ca/

UNB's Luci Install: https://www.unb.ca/sweb/parser/

# DISCLAIMER

The authors make no representations or warranties of any kind concerning the quality, safety or suitability of this software. See LICENSE for more information.