mSearch and the HTML Newsletter

webmaster Joined: Dec 26, 2005 Posts: 254

Kguske,

Been working on version 1.4 of the HTML Newsletter (my last one in the 1.x branch under PHP4 - I hope). I want to change over from using a file based saving mechanism to use the database and you had mentioned the potential with mSearch.

I have not had time to look at mSearch, so would like your input on my data modeling efforts based on your experience with mSearch.

What I am struggling with a bit are the trade-offs between saving all the "components" of the newsletter vs. the entire end result of it, with everything "built". It seems to me that all the links to the latest content (such as the "latest news", "latest downloads", etc.) would be redundant and would not provide value from a search standpoint. I am also concerned about having different links, yet again, to the same content pages (for SEO reasons) - this is an especially bad problem with news articles.

Therefore, what I am thinking of is that the basic component of "content" that is potentially "new and fresh" is the actual newsletter body, what the admin keys in, or pastes in, and which forms the primary content of the newsletter. That would probably be what should be searchable by mSearch? If so, then I do need to store the components of the newsletter rather than the final built version.

I just wanted to get yours (and others) perspectives on this before I get too deep into this fairly significant change.

Thanks!

Site Admin Joined: May 12, 2005 Posts: 876

I haven't looked in a while, but I think the way Fancy Newsletter works is to save the newsletter content into a text field, which can be searched by mSearch. As you correctly point out, this *could* cause problems with duplicate content, though having a link to other content on the same sight probably will not be an issue (e.g. sitemaps, etc., do this, but the newsletter itself has its own link).

The way I look at it, newsletter content is everything that was included in the newsletter. If you don't store everything in the database, will you store some sort of reference so you know what was included? For example, a boolean that says show the last 10 forum posts. How will you know which posts were included if they aren't stored? Isn't that what is currently stored in HTML format, and available from a link today?
_________________

Quote:

If you don't store everything in the database, will you store some sort of reference so you know what was included?

I am sure that I would have eventually come across this, but I am so glad you said something! It is funny now because it seems so obvious... Embarassed

Quote:

though having a link to other content on the same sight probably will not be an issue (e.g. sitemaps, etc., do this, but the newsletter itself has its own link).

Actually, come to think of it, I already have this issue with block and module access to the newsletters. The newsletters are crawled and they could have slightly different links (for example, article1.html vs. article-1-thread-0-0.html - and there are more... possibly even the sitemap Sad

these ARE multiple links to the exact same content which is not a good thing). But, what I really need to do is focus on just picking the best link for each content area and stick with it and SEO will just have to sort itself out (because nuke is just not SE-friendly OOTB).

Thanks! For, you have saved me a bunch of re-work AND it makes the design much, much easier.

Absolutely right!
Each newsletter is a 'snap shot' of data held at that 'moment in time' the newsletter is sent. Therefore it needs a constant hard reference whether it is by saving the actually textualised data as a whole or storing the reference paths - forumpost#497 etc.

For this reason you cannot store a query string i.e lastpost, lastpost-1, lastpost-2

Personally, I would only be worried about 'duplicate data' if you were changing the textual link.
For example;
forums.html-postid=12
forums.html-postid=12
forums.html-postid=12
Would seem perfectly OK. Yes it is multiple links to the same content BUT you are not trying to hide that fact.

However;
forums.html-postid=12
newseltter-01-forums.html-postid=12
*might* be considered far less favourably.

Obviously this is a poor example but if you were to look at News, thats a different and more serious kettle of fish!
We already have url's like;
News&file=categories&op=newindex&catid=1
News&file=article&sid=91&mode=thread&order=0&thold=0
News&file=article&sid=91
all of which lead to the same content - yuck!

Now I have to go and discover why my shortlinks isn't on grr.

Thank you for posting the links from news. This is definitely what I have been harping on for awhile now. Same problem exists between News and nukeFEED too at the news category level:

Feeds&fid=3&type=HTML (which happens to be at a news category level)

vs

News&new_topic=3

Each of these produce "substantially similar" content. It does not have to be exact duplicate content. Unfortunately, the SE's are not going to tell us their algorithm, but in my readings, the point has been drilled home with me AND I can also see how algorithms could be built to recognize this type of thing.

Welcome to nukeSEO.com !	Home News Stories Archive Forums Forums Unanswered Posts Modules Content Downloads Encyclopedia FAQ Tutorials Web Links Your Account Login Register Site Info Sitemap