Wikipedia has the facility to dump a massive number of pages, minus talks, users and old revisions, to a reader's hard drive, and the WikiTaxi tool is capable of interpreting those dumps (ie the mediawiki database format) and displaying them in a convenient reader for offline use. Since I'm getting back into the game right after a new version's release (and I stopped playing around 31.18, so there are a lot of features I have yet to learn), I'm in constant need of the wiki to familiarize myself, so I basically can't play DF offline. I'm almost unable to play DF without either immediate access to the wiki, or printouts of the pages I require. I'd love to be able to get a wikimedia grant to work on this, and take on less contract work, but so far their grant process is pretty hard to follow.I spend a lot of time without internet access, and DF is such a hardcore game that I find myself severely hobbled without access to the glorious wiki to provide me with the raw information I need. One disadvantage is you need to provide separate search indexing, but that's doable. Generally I have not been very impressed with the quality of ZIM file tooling. You can distribute a package of text content and image content as separate files, for example. WARC is also probably a better tool for distributing web-archive type content, like wikipedia dumps. You can also do things like re-compress and minify images, a dump intended for a cellphone probably doesn't need 4k images. There are a lot of advantages to starting from a dump, you can provide much better tools for filtering articles, probably even provide rudimentary document classification. I haven't had time to really track that down, but if anyone want to it's pretty easy to reproduce, just try adding a few million lorum-ipsum articles and look at how far from linear time it's running. Oddly enough where I've run into the biggest issues is in weird slowdowns of the python WARCIO library that making dealing with large archives just about impossible. Rendering wikitext is challenging though, since wikitext can include chunks of other wikitext, and wikitext can use some pretty complicated templating functionality. I've been trying to put together a system for generating a WARC file by rendering all the wikitext content in a database dump, which is a lot more reasonable of an approach. The whole zim file infrastructure is pretty broken.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |