I've made some changes to the way the neural network works and it's n-gram inputs. This should result in better structured sentances. The genetic algorithm that is evolving the neural network is constantly running, so the predicitions should get better day by day (who knows, maybe one day it will actually learn how to predict a proper horoscope). The context-free grammar still needs a little work, so if you spot anything totally rediculous leave a comment and point it out to me (unfortunately getting a grammar that is for regular english doesn't work too well with the garbage writing you usually find in horoscopes, so I've had to evolve my own - yet another GA in the loop).
The spider seems to have done it's best to find any suitable websites, with a total number of seed horoscopes int the database of about 33,000. This will continue to climb, but not as fast anymore (until I re-work the spidering code).
I've also adjusted the Levenshtein Distance and Longest Common Substring checks to be less tolerant to make sure that each generated horoscope is no more than 0.5% similar to one already in it's database - this should make for some interesting reading (and reduce the possibility of me being sued for copyright infringement). The other three tunable parameters are a secret, and I leave as an exercise for the reader...
Saturday, April 25, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment