nukeSEO.com - PHPNuke SEO Search engine optimization, professional tools including nukeSEO, nukeSPAM, nukeFEED, nukePIE, nukeWYSIWYG and more

 

. Welcome to nukeSEO.com  ! 
.
.
.


.
nukeSEO.com: Forums


 Forum FAQForum FAQ   SearchSearch   UsergroupsUsergroups   ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

add nofollow and target _blank trough a filter function
 
 
Post new topic   Reply to topic    nukeSEO.com Forum Index -> PHP-Nuke enhancement ideas
View previous topic :: View next topic  
Author Message
GazJ
webmaster


Joined: Mar 20, 2007
Posts: 29

PostPosted: Fri Apr 01, 2011 4:59 pm    Post subject: add nofollow and target _blank trough a filter function Reply with quote

not sure i would call it a filter function but owell

anyways i was attempting to manipulate the output buffer sent through the headers and soon found draw backs to this so i moved on and decided to apply this to my template system i suppose it could added to check_html function.

the function its self
Code:
function playWithHtml($OutputHtml){

    if(!preg_match_all("/<a>]+)>(.*)</a>/Usi",$OutputHtml,$Links)){
    return $OutputHtml;
   }
    $InnerHtmls=$Links[2];
    $LinkTags=$Links[1];
    foreach($LinkTags as $LinkTag){
   $OldLinkTag=$LinkTag;
      if(preg_match("/href=[\"']?http/i",$LinkTag)){
         if(!preg_match("/\starget=/i",$LinkTag)){
         $LinkTag=trim($LinkTag).' target="_blank"';
         $OutputHtml = str_replace($OldLinkTag,$LinkTag,$OutputHtml);
         }               
      }      
   }
    foreach($LinkTags as $LinkTag){
   $OldLinkTag=$LinkTag;
      if(preg_match("/href=[\"']?http/i",$LinkTag)){
         if(!preg_match("/\srel=/i",$LinkTag)){
         $LinkTag=trim($LinkTag).' rel="nofollow"';
         $OutputHtml = str_replace($OldLinkTag,$LinkTag,$OutputHtml);
         }
      }      
   }
    return $OutputHtml;
}


now apply it to a string
Code:
$sting = playWithHtml($sting);


what does it do exactly it uses regex to search for link tags to offsite links or links with http://somelink.com without target _blank and adds target _blank and does the same for nofollow

just a quick convo starter any suggestions to improve this would be welcomed
  
Back to top
View user's profile Send private message
Guardian
webmaster


Joined: Dec 25, 2005
Posts: 364
Location: Vsetin, Czech Republic

PostPosted: Fri Apr 01, 2011 5:34 pm    Post subject: Reply with quote

I have seen some simple and also very elaborate ways to do the same thing using java script. The only problem is, the people that devised these efforts forgot one simple fact; generally, spiders don't use java script, so using it to add the nofollow attribute is pointless.

The PHP approach, like your example is really the only way, though it is also pretty simple to add the nofollow ability to the FCKeditor for links in things like News and comments.

You should also keep in mind that only Google adheres to the nofollow attribute religiously. A few bots like MSN/Slurp crawl the links but don't index them (even though they still count it as an outward link in terms of link juice) but the majority simply ignore it.

If your sole goal is to reduce link dilution, the only really affective way to do it is to hide the link and expose it with java script (so it's visible in a browser) or use PHP to hide the link from specific user agents
http://www.code-authors.com/modules.php?name=CA_Snips&op=view_snip&sid=11
  
Back to top
View user's profile Send private message
GazJ






PostPosted: Fri Apr 01, 2011 6:33 pm    Post subject: Reply with quote

good thinking i will adjust my code to exclude bots thanks Smile

oh and the code needs updating bud theres an eregi

Code:
/**

 * @author Guardian
 * @return <boolean>
 * EXAMPLE USEAGE:
 * if(!is_spider()) {
 * // display hidden content here
 * }
 */
function is_spider(){
  $spiders = array(
    'Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'Slurp', 'msnbot',
    'ia_archiver', 'Lycos', 'Scooter', 'AltaVista', 'Teoma', 'Gigabot',
    'Googlebot-Mobile'
    );
// Loop through each spider and check if it appears in
// the User Agent 
   foreach ($spiders as $spider) 
     {
       if (preg_match('/'.$spider.'/i', $_SERVER['HTTP_USER_AGENT']))
        { return TRUE; }
      }
      return FALSE;
}
  
Back to top
Guardian






PostPosted: Fri Apr 01, 2011 8:03 pm    Post subject: Reply with quote

Good catch, I have updated the snippet.
Thanks also for the PM - I don't even have a clue where that newuser.css file came from lol
  
Back to top
GazJ






PostPosted: Fri Apr 01, 2011 9:57 pm    Post subject: Reply with quote

i noticed the notice about spam emails on your registration i know you already fixed the issue but i was searching for a solution to this my self as my site had over 2000 users and all but afew were lets say not right

anyways i found this function for drupal and modified it slightly and added it to nuke validate_mail function

Code:
function user_validate_bogus_email($mail){

   global $nukeurl;
   // http://drupal.org/node/780476#comment-2887028
    // reads an entire file and stores it into an array so each line of the file is
    // stored into a new element in $spamlist_array.
    /*
        $spamlist_array[0] = 'This is line 1';
        $spamlist_array[1] = 'This is line 2';
    */
    $spamlist_array = file($nukeurl.'/includes/spamlist.txt');

    // iterates till the end of the file where each element of the array is represented as
    // $line_num and the actual value as $value
    /*
        $spamlist_array[$line_num];
    */
    foreach ($spamlist_array as $line_num => $value){
        // we want to check whether the current line ($value) is in $mail
        // we are not simply checking for $value because it has some type of character that makes the strpos()
        // function fail (most probably the new line character)
        $realvalue  = substr($value, 0, strlen($value) - 1);    // string to search in, start_pos, length
        $pos = strpos($mail, $realvalue);
        if ($pos === false){
        return true;
        }else{
      return false;
        }
    }   
}

function validate_mail($email) {
    if(strlen($email) < 7 || !preg_match("/\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*/", $email) ||
   user_validate_bogus_email($email) === false) {
// These next 3 lines have been commented out by Raven on 1/14/2007.
// Reason being, this function should only validate the email and return to the calling script.
// The calling script should handle the validation results.
//        OpenTable();
//        echo _ERRORINVEMAIL;
//        CloseTable();
      return false;
    } else {
        return $email;
    }
}


spam list link http://compuweb.com/url-domain-bl.txt

any thoughts on this as of yet it is untested i will also be redoing the user registration to confuse things a little
  
Back to top
Guardian






PostPosted: Fri Apr 01, 2011 11:50 pm    Post subject: Reply with quote

It really depends on what you are trying to achieve.
I spent a long time trying to find something to correctly validate an email address for a form builder Class I'm working on and the consensus seems to be that it is something of a holy grail.
I have not yet seen any single piece of code that validates an email address 100% correctly to the required RFC specifications.

If you are validating to "the address conforms to the RFC specification" the one used in RN is probably the closest as I know Raven did a lot of research but it is really heavy on resources due to those regex's.

I was speaking with one of the Facebook guys a few weeks ago about this very thing and he said they use the perfect code (which he wouldn't share) but in actual fact they don't. If you try to change your Facebook email address to a legitimate address with a hyphen in it, it falls over - unless they have fixed it.

What I'm doing at the moment, for the sake of efficiency is;
Code:


$email = 'you@atyourdomain.com';
if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    // this is valid proceed
}
else {
    // filter again with RN function in mainfile.php
validate_email($email);
}


If you are trying to validate it as "this doesn't belong to a spammer" you might want to hang on a couple of weeks for Site Guardian to be released as I'm building hooks into RNYA to prevent known bad domains from being used for registrations. And I have a LOT of them Smile
  
Back to top
GazJ






PostPosted: Sat Apr 02, 2011 12:38 am    Post subject: Reply with quote

well im currently working on a new site for myself so i have time to wait as cleaning up stock nuke latest patch takes awhile damn ereg's lol
also im removing intval's in favour of int's and other performance related stuff just to help speed things up without the use of cache then its onto rewriting the your account, news and downloads modules so theres alot todo so yup i can wait lol
  
Back to top
kguske
Site Admin
Site Admin


Joined: May 12, 2005
Posts: 876

PostPosted: Tue Apr 05, 2011 9:55 pm    Post subject: Reply with quote

GazJ, regarding my other post referring you to Site Guardian...looks like you already know of it.
_________________
  
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:       
Post new topic   Reply to topic    nukeSEO.com Forum Index -> PHP-Nuke enhancement ideas All times are GMT - 5 Hours
 
 Page 1 of 1

 

Jump to:   
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © 2001-2008 phpBB Group


Page Generation: 0.06 Seconds