-I changed direction on how I'm porting the analysis from the old server to the new. Instead of using the old java ported as is to Solr, I'm determining the steps each analysis takes and defining those steps directly in schema.xml. This should make future maintenance much easier.
-I've been making my way through each core deciding which of the new field types to use for each attribute, importing each core's data from the database, fixing errors as they pop up, trying to improve the indexing and think into the future a bit as to how we will utilize some of Solr's features. I'm though Annotation, Area, Artist, CDStub, Editor...and the further I go the quicker it goes as each one covers more cases likely to appear in future fields as I go.
-Porting the Area, Artist, Label Boost configurations
-Add remaining entities to sir
-Continue working through cores
I'm still not sure why we need this...maybe I'm overlooking something. It looks like tokens aren't stored, only analyzed/indexed; but if we use the same analysis at query time as at index time (which it appears we do) the indexed tokens retaining their accents will never be accessed. ...correct?
I really don't understand the purpose of these...could someone explain?
Would an ICUTransformFilter using Greek-Latin and Cyrillic-Latin rule sets do the trick? ...or are resources the primary concern?
-I noticed our analyzers don't use stop words? Is this something to continue? Seems like a good conservative list of English stop words would be useful.
Personal Criticism: -I've been really bad about getting my code up on github...hope to get that up today. I just haven't gotten into the habit of it....I'm so used to working by myself on school projects.
Personal Observations: -Solr is awesome! It's super powerful...and it's really easy to get bogged down thinking about adding extras.