Luci - changelog
- Wednesday, July 5th 2006 at 1:04 p.m. - luci/1.4
=head1 TO BE ASSESSED - (for future release)
- what about one instance that calls another that calls back, etc... we've cleaned most of this up, but would it be easier to recognize this with mod_perl and/or sessions? - what about mod_perl would this allow a throttle of sorts so we could limit requests from crawlers? - look at providing page summary option see feature request #1063126 - add test for proper configuration could probably check for required modules test this with bad settings. we may need to check for tmpl files too not sure if this is necessary yet. we have marked what needs to be set in the conf. if luci grows much larger, this may become a requirement - add feature to allow hrs? (see: Lynx) - try commenting out allowed ext on the default url need to look at checking extensions on index files (hidden) the only thing that's messed up here is if someone actually had their server setup to serve something other than one of the extensions in the allowed list as the default index. ex: 'DirectoryIndex index.pdf' in the apache config would force luci to attempt a parse on index.pdf. - what about download of different mime types? (ie. from cgi - ex: vcard) - 'graphical version' link in framed page should return entire frameset - should provide other options for link delimiting - add functionality for enctype='multipart/form-data' (uploading) this will require a repository for luci to temporarily store files on the server - should there be a logout feature? (ie. wrap in a template?) for htaccess (401s) - look at other protocols: ftp/gopher/etc...
- document qmeta - fix url provided is not valid etc... this has been updated - debug is available for luci.cgi, should we have a global debug that can be used for the libraries? should all of luci have debugging? took this out of the TBA list seeing as the functionality has been added - _denylist/_allowlist see config for rules/conventions note: you can use the allowlist/denylist settings together to allow most files, but not all from a specific domain - added _denylist feature - allowed_domains renamed _allowlist - under apache 2, we noticed luci having difficulties with path_info when the request for luci is already a path_info request. the solution was to simply rebuild the original url, then pull path_info from there my $path_info = ($config->{system}->{apache_2}=~/true/i) ? URI::URL->new($ENV{REQUEST_URI}, $ENV{SERVER_URL})->abs->as_string : CGI::url(-path_info => 1); - added default value to $sep variable
- wait for john's final version of his MS Environment (2003) doc update the website doc when it arrives - making use of config feature to allow deeper debugging ie. in the modules - there is an issue with multi-checkboxes in post request multi-checkboxes were posting null delimited and are now cleaned up and sent as distinct entries - ok we're done with findbin and mod @INC directly use vars qw($path $sep); BEGIN { # see changelog $0=~s/^(.*)(\/|\\)// and $path=$1 or $path='.'; $sep = $2; unshift @INC, $path.$sep."lib"; } this allows a little more flexibility, passes taint and works on MS - added a config setting for apache 2 note: default apache 2 install does not recognize path_info as a separate entity, and returns the entire url regardless - update MS 2000 docs done. - add MS 2003 docs - will require testing done. - perlis.dll john's document to provide instruction's based on perlis.dll - what about one instance that calls another that calls back, etc... if two instances simply reference each other via path_info this is clean. if path_info =~ app_root_url, luci will spawn (self recursing) this is fixed. we fail over to default_target if two instances reference each other via default_target there is a possibility of infinite loop here this is fixed. 1. since this is recursive, we test for app_root_url in path_info 2. on occurrence, we intend to swap for default_target 3. if default_target == app_root_url... fatal error - test MS 2003 https - update docs wrt LUCI_CONF and FindBin - update cvs link - check all links clean up links to yourdomain. they should not href - add john to contrib - add john's doc to the website - use lib (($FindBin::Bin.'/lib')=~/^(.*)$/); solution ported from FindBin::libs (find_libs subroutine) see source: http://search.cpan.org/src/LEMBARK/FindBin-libs-1.25 /lib/FindBin/libs.pm as per doc: http://www.gunther.web66.com/FAQS/taintmode.html#require when running taint mode (-T), '.' is removed from @INC we need '.' in order to reference our lib which is at './lib' we could set lib manually ex: use lib './lib' unfortunately, this is not portable. Under the MS environment (2003), this simply will not work, and we would like to avoid having users edit the executables directly. the solution is to use FindBin, which will produce cwd. Unfortunately, FindBin will not pass the taint check when used in this context because $FindBin::Bin is tainted. the solution is to untaint - again have corrected a bug found in the attempt to pull a proper path_info as opposed to failing, we now correct the error by setting path_info manually if it is not provided. this fix also rectifies the error found under MS where an incorrect value is supplied as path_info when no path info is supplied. if app_root_url is not set correctly in the config, the user will notice that links will not work properly. after turning on debug, it may be noted that app_root_url under config does not match the base url that can be viewed at the top of a browser window. - cleaned up debug settings 1 dumps vars/env/config 2 dumps object data also - converted index.cgi redirect to use CGI::redirect simple location: print would not work - now making use of FindBin for locating luci.conf.cgi this was required as a MS fix - added feature that allows toggle of heading level display see conf - modified the CBC specification in PageFetcher.pm. someone ran into problems where the iv was required even though prepend_iv was set false. anyway, these options were removed, and having header set as 'salt' should allow CBC to generate the cipher key and the iv from the passphrase (in our case, the cryptkey) - added feature <span class="luciignore"> ignore this </span> added documentation for this feature - update sourceforge admin added screenshots removed forum and patch tracker updated all tracker items to correspond with shawnmcginn@sourceforge.net cleaned up current tracker items - look at TBA some of this was completed. - should output follow the xhtml standard? added use of module HTML::Entities for proper URL encoding most other errors cannot easily be corrected by luci. for example, it is a requirement that the id attribute value begin with a character and not a digit. if it is the case that the page you are requesting does not follow the proper convention, it will continue to fail the xhtml standard as luci does not attempt to correct a misspelling of the id value. also found a spelling error in the templates wrt </tbody> that was causing an xhtml error with open tag. note: luci does a pretty good job at cleaning up the html, in turn possibly making a site 'more' xhtml compliant - add DEBUGging feature (for dev) would require levels put DEBUG in conf (use Data::Dumper) to date, debugging has been ad-hoc this was done long ago. there is also a TBA on debug in modules - documentation could use a review this is done. docs look good - look at possible ip forwarding the idea here is that luci would represent herself with the ip of the client, as opposed to the ip of her host. we spent some time researching the possibility of forwarding the user ip through luci, but we realize this will definitely introduce many more challenges, and does not appear possible. for one, we believe ip spoofing (which is somewhat indirectly the requirement here) may be considered illegal, although this may be somewhat dependent on the context. on another note, even if we manage to get it working, it should be noted that ip communication is a two way street. ie. responses are returned to the requesting ip, and if luci is to represent herself with the client ip, she will therefore not receive the response, as it will be returned to the client - we had a request wrt cookie proxying. some substantial discussion came from this, and our decision was to avoid implementing this feature for several reasons. some info follows: - the request: From the LUCI documentation, it says that LUCI "stores all cookies that the user comes across in one single cookie." Does this mean that users that create cookies during text-only will not be able to use those cookies in graphical view? If so, is there a workaround? - our response: Even though it 'may' be possible to have luci pass a cookie from somesite.com to your browser, adding this functionality would appear quite confusing to a client, seeing as luci cannot do the reverse; ie. provide cookies that you have set in your browser via graphical surfing to somesite.com. The other issue we would have to take into consideration if we were to consider adding this feature is that we are really unsure as to whether or not luci would be able to set a cookie in the client browser that belongs to somesite.com and not mysite.com. Please refer to section 4.3.2 Rejecting Cookies (rfc2109) as per W3C: http://www.w3.org/Protocols/rfc2109/rfc2109 "a user agent rejects a cookie if ... * The value for the request-host does not domain-match the Domain attribute. ...." So, in other words, a cookie with domain "somesite.com" passed to the browser via luci hosted on "mysite.com" will be rejected.
Also, we can't make the cookies be given from graphical to luci properly because of the path var in cookies. The browser will call luci from its path, and the cookies for the requested page (which presumably is in a different path) will not be sent to the page, because luci never got them from the client (unless the site sets cookies on a '/' path). This makes the transition from graphical browsing cookies to luci browsing cookies impossible.
Going from luci browsing cookies to client graphical cookies is possible for sure on the same domain, but is pointless in reality if the other direction doesn't work, and even more pointless for external domains (if setting cookies for external domains actually works)
So to avoid the confusion that could quite possibly come from this, we've decided against implementation. If a strong case arises, we may revist this request. - what about converting luci.conf to luci.cgi or luci.pl or luci.pm this will require docs in luci.cgi pod done. we've changed it to luci.conf.cgi. docs now reference the new configuration file name - removed the binmode STDOUT utf8 fix. it was messing up obviously simple characters - cleanup warnings (ie. -w) I've cleaned up some... maybe a complete cleanup for 1.4. There aren't that many warnings, and alot are associated with external modules - use of uninitialized value in concatenation (.) or string at lib/CookieManager.pm line 64. - use of uninitialized value in pattern match (m//) at lib/PageParser.pm line 130. - use of uninitialized value in pattern match (m//) at lib/UtilFunctions.pm line 70 - use of uninitialized value in concatenation (.) or string at lib/PageParser.pm line 312. - eval { binmode STDOUT, ":utf8" } fixes wide character in print error - check on allowed domain .abc.ca and/or *abc.ca before completing the upgrade there was a bug associated with this regex where if a user specified yourdomain.com, as allowed, xyzyourdomain.com would also be allowed this has been fixed - cleanup lists from conf. set them up as hash lookup (ex: ignore_tag) decided against this. current method of use appears to be the most efficient - consider removing web content from luci package the following was decided: 1. remove the INSTALL.txt 2. move README.txt to luci root directory. update it to point at luci.sourceforge.net 3. move LICENSE.txt to luci root 4. remove docs folder from future distributions the idea here is that the documentation we link to will always be the most recent, namely that located at the sourceforge site - issues wrt too many concurrent invocations of luci in our environment, where luci is hosted without any url restriction we have seen instances where googlebot will attempt to index the internet via luci, in turn saturating any available resources on the host, denying access to all other requests solution: performance test luci and see what happens this was done using httperf and autobench results show that all luci invocations start up and releases appropriately without any relation to the number of luci processes spawned. the real issue is that crawling engines such as googlebot recognize sites via luci as non-existent in their index, and so begin indexing, even though they should be considered a duplicate. this should be a non-issue for smaller sites that use the allowed_domain directive in the luci.conf file, but for larger sites, or those that do not make use of the allowed_domain directive, this can be a problem. for example, the googlebot engine is not only fast, but runs in a distributed environment, and if it begins spidering the internet via some luci instance, could bombard your server, in turn bringing it to its knees. for information on googlebot, see: http://www.google.com/webmasters/bot.html#fast final solution: as per the googlebot documentation, we went ahead and added both the robots.txt entries and the meta nofollow pragma necessary to have crawler engines ignore luci. seeing as this affected us directly, we also decided to contact google to ensure we took the proper route, and so google checked and confirmed our robots.txt was setup correctly. without the robots.txt entry, the nofollow pragma should be enough to ensure a crawler engine does not proceed any further than the initial page served via luci. with the robots.txt entry, your luci instance shouldn't be indexed whatsoever. information wrt the proper content of a robots.txt file will be provided with the install documentation, and the appropriate meta tags will be included by default in the header of those templates where it is necessary - clean up use statements done. - added config ref to the UtilFunctions object - add disclaimer toggle option to config freed up is_internal for its use (see is_allowed) we compare is_internal vs. user setting in conf added an override that forces disclaimer no matter the domain added disclaimer template see luci.conf: possible values: external - disclaimer will only display on pages external of yourdomain.com force - disclaimer will always display off - disclaimer is not displayed (default) - added _is_internal to displayer object required for disclaimer option - fix pod linking errors and clean up documentation - 180s default timeout appears to be too much... when luci takes off on a bad ua request, (to be fixed separately), it will spawn some number of processes which can be limited with a decent setting here. the initial setting has been changed to 16s - can we keep luci from spawning multiple processes? it looks as though luci will call herself if app_root_url is not set correctly. hence luci spawning is actually luci calling herself, in turn calling herself... etc... this has been fixed. we now test that the substitution in parse_target of luci.cgi actually acquires path_info, and fail otherwise - as it was required for the path_info error to die, the error routine in UtilFunctions can now be overloaded with an error code, and will force the application to die
- PageParser error on base target pointing at _top/etc... fixed. ignore target - fix size sort on settings page done. went back to <=> and use integer priority in luci.conf - link delimiter should work on all links including luci stock done. - add cfm extension for parsing - need to test ssl on IIS done. - include Kyle's documenation done. Kyle's docs have been merged - add DEBUG done. very broad debugging has been implemented and can be configured from luci.conf - https test bug when testing ssl in luci.cgi, we now check app_root_url as opposed to testing the SERVER_URL environment variable in an attempt to reduce environment dependencies. - add Scott to Acknowledgements done. - thanks to unbsj its done. - add platform specific section done. - I tried to get luci under MS Environment to not require user modification of the 'use lib' and LUCI_CONF constant, but cannot seem to get around the Cwd/FindBin taint problems. instructions for MS users will be added - 'use lib' and LUCI_CONF ref now use single quotes. double quotes were causing issues in the MS environment - pod all files done long ago. we need to complete the documentation. - provide logo done long ago - removed from tba. - removed the $ENV{SERVER_URL} dependency, and now check SSL against $config->{app_root_url}; - need to upgrade the pod in luci.cgi to reflect the new Twofish_PP package requirement and remove Shark done. - switched to the Twofish_PP algorithm which seems to be available to both UNIX and ActivePerl (PageFetcher.pm upgrade) Shark was not available from the ActivePerl Package Manager at the time of this writing. Although it could probably be compiled for use under ActivePerl, it was decided we would use the one that is immediately available to the package manager. - removed the use of $ENV{PATH_INFO}: now using CGI::url and pulling path_info manually. this seems to have fixed the PATH_INFO environment variable problem in the MS Environment, and continues to work properly under UNIX. - css the pod decided against this. we like pod - MS Environment: Luci seems to be able to parse herself. Makes a mess. fixed. In luci.cgi, clean url before self parse test - MS Environment trims double slashes in path_info fix added in luci.cgi
- display settings page is carrying meta refresh from target fixed. settings page no longer carries refresh from target - malformed meta refresh tag does not work properly. fixed. luci will now allow for (reasonably) bad content and will correct accordingly - updated author info in all pods - removed BUG info from pod. BUG info hosted on sourceforge. - should be a space between 'Undescribed[image]' for alt tag images fixed. Also changed [image] to [ALTTEXT] - update luci.conf with new version information done. - inconsistent display of settings on display settings page fixed. using cmp instead of <=> and sort for color schemes - fix wording - luci error page: 'request that they host Luci in a "secured"' done. - inconsistent name struct for settings: ie. stlye settings::display settings done. both are now 'display settings'
- add link to luci page where version is displayed done. - change 'text only' to 'accessible version' in main docs done. - add new luci logo, and add image sizes, alt, etc... 30x30 and 150x150 - change NAME in luci pod -> Luci - (href)UNB's eLUCIdator done. - link ref to UNB, and add Canada done. - add link to UNB in related links done. - add UNB Webteam to acknowledgements with email and web address done. - remove empty <tfoot> tags from templates done. - cleanup use statements done. - TM application references (if required) done. - comment the config done. - install done. - required modules done. - why the rewirite? done. - write something so that Luci automatically grabs the correct url. done. see index.cgi in distribution root - explain what needs to be done to hide the conf (if this is needed) done. - force fail on https request when luci hosted under http done.
- source_delimit was causing problems where inline tags such as <b>, <i> etc... were embeded within a string. We've added an ignore_tag array to the config. tags in this list will not be replaced with whitespace. see conf for details. - removed the path_info trailing slash fix the fix will force luci to see documents without an extension as a directory - added 'allowed tags' in PageParser - ex: pre is now allowed useful for directory listings - add option in config to allow carriage return in the HTML or not this can probably happen anywhere except within <a></a> tags. if we put a carriage return before </a>, then we get trailing whitespace in our links done. see source_delimit option in config (ie. \n follows most closing tags) note: if this option is used, we loop through the html content to add line breaks, else, we simply use a 'join ""'. - problems with value="" attribute of option, radio tag pushing empty attributes can mess up input tags ex: if a tag (like option/radio) has the value attribute specified as "" (empty), then the value attribute should be added to the tag (hence the reason for 'if exists'). If the value attribute is not specified in the tag, then it should not be included as the value outside the tag should be the result
- cleanup warnings (ie. -w) added proper testing for $domain_str in UtilFunctions when luci.conf was updated, the reference to cryptkey was not. fixed init the tags stack and the html array ref in PageParser test for defined on content-length header for frameset pages - cleanup conf - remove commented (unused) code
- repeating site navigation: jwebster@unb.ca && shawn@unb.ca JW: "It would be nice if site navigation items (as opposed to page navigation items), could be at the end of the page instead of at the top." SM: "I think css will solve this problem. The only other way around this (easily) would be to add special 'comment' tags that luci could parse for and would force luci to skip or print some specified HTML." we will add this feature in a future release if requested. - array ref the frame and frameset html strings for consistency - added hidden fields to replicate a submit via image for pages that are dependent on image submit, we cannot replicate the .x .y coordinate values passed, but if only using image submit for looks and possibly capturing the fact that the button was pressed it should work - anchors are being picked up as hrefs. should probably test for href-less anchors fixed. anchors will not be delimited, and work properly - look at style a:hover, a:active { color: ; } fixed. incorrect specification in template - rename parser.cgi -> luci.cgi done. also renamed the conf luci.conf - external links - from jwebster@unb.ca && shawn@unb.ca [note: some content snipped] JW: "[This raises] the bigger question: why do we boot the user out of luci when they visit an external link we're sending them to. I assume there is some technical-ish reason but it seems a bit mean and annoying from the sight-challenged user's point of view?" SM: "If a user leaves one of the domains specified in the config (ex: .unb.ca,.unbsj.ca, .unbf.ca, etc...), then a link is considered to be external. This will require us to add domains that do not fall under our unb structure such as www.unbfutures.ca, although adding domains is not an issue - easily done in the config. The reason for having this feature is that without restriction on the URL, a user can fetch any URL he/she wishes. Our initial concern was that a user may attempt to download large content from an external server. If a user were to invoke the application several times in parallel against a very large document, this could possibly impact server and network performance. As stated, this was an initial concern. Kyle and I have added a feature that allows the application owner to specify a maximum download size in the config. Our development version is currently set at 1MB. If a user tries something larger than the max, they will get an error page explaining the error, and asking them to contact us [application owner] for further discussion. Another concern: Should we be allowing our network bandwidth for surfing external websites? Essentially, this is what luci would allow without the restriction. Third: luci is not guaranteed to work 100% correctly with really 'AWFUL' html. We are lucky that most users at UNB (at least those with more popular sites) tend to use a good editor such as Dreamweaver to build their pages. That being said, if a page at UNB has a problem when being parsed by luci, we can easily fix the HTML. I am not sure how we support the application wrt external sites. Perhaps a good disclaimer would be all that is required. Allowing luci to parse anything is as simple as (not) setting a specific parameter in the config." If the 'allowed_domain' parameter is left empty in the config, then any requested URL will be parsed. As stated here, there are some issues to be taken under consideration if you decide to allow any URL. - shouldn't external links open in a new browser window? (from jwebster@unb.ca) external_target in the config specifies external link behavior - config tds and allow only spaces or line breaks (see PageParser.pm) decided against this. if needed, this could be added in a future release. Spaces look ok. - separate adjacent links with more than white space see: http://webxact2.watchfire.com/themes/standard-en-us/help/ HIDD_WDContent_G35.html there is now an option in the config to allow either white space or [] (block) as the delimiter. area hrefs are also delimited (map).
- The following change requiremens from md@unb.ca should help luci maintain a sites Bobby(tm) compliance - allow 'id' attribute in the following tags: a, form, input, select, area option, textarea, frameset, frame map - it's automatic (we don't manipulate the map tag) - put <label> tags back in the html - replace <b> tags with <strong> - add lang="en" to <html> tag - add DOCTYPE before html - <style> tag needs type attribute: <style type="text/css"> - place language meta tag after <HEAD> - id style (#id) should be unique, yet class tags can be used more than once: change #small style to .small and replace id="small" with class="small" - footer table found in content.tmpl: add summary attribute to <table> add <thead>, <th>, <tbody>, <tfoot> remove <caption>: according to Bobby(tm) (see: http://bobby.watchfire.com/bobby/html/en/gls/g270.html) it is recommended that you use <caption> where a table is used to provide data. The table we've provided was used specifically for layout. - try converting content string (self->{html}) to an array done. all html content is now stored in an array ref: self->{html} - with these additions, Luci should at a minimum maintain a sites Bobby(tm) status. It shouldn't degrade Bobby(tm) approval.
- html has been templated using HTML::Template - if an image link contains no alternative text, it will be labelled 'undescribed'. perhaps we need to allow the option to set it to the associative URL in config we've added the ability to set the 'undescribed' text in the config file. allowing href here looks somewhat cumbersome, but should be looked at in a future release - frame page refresh when adjusting display settings from a frames page, all frames will be updated upon returning to the referer. this feature could be added to each option in the 'Page Style Settings' page, but unless requested, we've chosen not to in light of less reloading on the server. - added support for \%rest in cookie manager. see: Http::Cookies -> http://search.cpan.org/~gaas/libwww-perl-5.800/lib /HTTP/Cookies.pm - redirect correct url in browser window when a user is redirected, luci will now correct the URL in the location bar. allows bookmarking - what about checkboxes/radios/select boxes that are supposed to be checked? added selected to attr list for select boxes added checked to attr list for checkboxes and radios (input boxes) - Line spacing has been added as a user configurable option - Taint checking should be turned on. (and should work fine) it is, and works great - luci should not parse herself if luci were able to parse herself, a substantial load could possibly be applied against a server via recursive parsing. on recurse attempt, the system->default_url is parsed instead. see config. - ssl luci was written to allow access to ssl enabled web sites. although she can handle the protocol on her own, running luci from non-ssl (http) implies un-encrypted communication between luci and the browser. for complete security, luci should reside behind ssl. one advantage is those sites which require login via ssl. your information will pass encrypted. - what about error codes from the server errors returned by the server are displayed as any other html parsed by luci. it is up to the server admin to provide detailed information in their 404, etc... pages - visited links this feature works; current colors are not far off of a:link yet can be redefined in config - output autoflush: $|++ (see 'man perlvar' $OUTPUT_AUTOFLUSH) we've tested this feature to see if our application would take advantage of flushing, but according to the docs, it's only autoflush on output, and we are dumping everything with one statement. (non-incremental) no advantage, and therefore was not used. - cookies: size and excessive use the cookie specification as per RFC2109 (see: http://rfc.net/rfc2109.html) states that a user agent must at LEAST accommodate for a cookie of 4 kb in size. (quashing the myth that browsers only support cookies 4kb in size) luci not only stores all cookies that the user comes across in one single cookie, but also stores any information pertaining to the creation of those cookies within that cookie. (see: http://search.cpan.org/~gaas/libwww-perl-5.800/lib /HTTP/Cookies.pm#METHODS for details on cookie creation) as a result, the application should work very well where cookie sizes are moderate. it is up to the browser how it will perform once the cookie size surpasses the required 4 kb limit. (truncation a definite possibility) Therefore, application performance wrt cookie size and excessive use is uncertain - put application name in config (ie. user agent) - added error message for max download size exceeded - specify max download size in configuration file - clean up multiple brs - cleanup input URL - add regex against internal domains - input type=img replaced with submit button - tds converted to spaces - anchors added name attribute to links - html content size (user agent has a default?) added the size restriction - see config - switch settings to css...look at line spacing done - what are we going to do with title? Title = .... seems good - ~s cause problems redirection urls were not resolving wrt the current target. fixed - view image alt tags as an option done - label links as external done - file downloads specify in config what files are to be parsed. All others are for download - select statement problems fixed (removed option tag from stack) - directory listing works fine - htpasswd you cannot be logged into more than one htaccess site at a time. once logged into one site, if the user requests a different 401, they will be prompted for the 401 credentials, and this new data will overwrite that which was set for the initial htaccess site.. - cookies what about expiring cookies according to thier own specs? - at a max of what our cookie expires? cookies will expire because they are treated normally when thrown at the server from LWP. we may hold onto the cookie data for a period of time greater than that required of the cookie, but a cookie should fail if past expiry - what about a time out feature? see LWP::USerAgent docs and constructor in PageFetcher timeout can now be specified in the config - default is 180s
See Luci Documentation for more information.