PHP Word N-grams Generator

Here’s a class I wrote that can generate word n-grams from a sentence in PHP. So far I haven’t found any word n-gram implementations, hence my own code. Inline documentation provided, and basic unit tests included. Want to run it immediately? It’s also available on PHPFiddle.

<?php 
/**
 * PHP word n-grams generator (order of n-grams doesn't matter)
 */

class WordNGram
{
    /**
     * Get unigrams
     * @param string $sentence - Sentence to use
     * @return array
     */
    static function get_unigrams($sentence)
    {
        return self::_tokenize($sentence);
    }

    /**
     * Get n-grams
     * @param array $unigrams - Unigrams
     * @param array $n_1grams - Previous n-grams, i.e. to get digrams, send unigrams in this parameter
     * @param int $grams - Value of n in n-grams, e.g. 2 means digram, etc.
     * @return array 
     */
    static function get_ngrams($unigrams, $n_1grams, $grams)
    {
        $ngrams = array();
        for($i = 0; $i < count($n_1grams); $i++)
        {
            $base = $n_1grams[$i];
            $base_last = is_array($base) ? $base[count($base) - 1] : $base;
            $last_pos = array_search($base_last, $unigrams); // Find index of last item of base in unigrams
            for($j = $last_pos + 1; $j < count($unigrams); $j++)
            {
                $ngram = self::_array_flatten(array($base, $unigrams[$j]));
                $ngrams[] = $ngram;
            }
        }
        return $ngrams;
    }
    
    /**
     * Split string using spaces as delimiter
     * @param string $sentence - Sentence to split
     * @return array
     */
    static function _tokenize($sentence)
    {
        return preg_split("/[\W]+/", $sentence);
    }

    /**
     * Flatten array
     * Source: brownelearning.org/blog/2012/04/quick-way-to-flatten-multidimensional-arrays-in-php/
     * @param array $input - Array to flatten
     * @return array
     */
    static function _array_flatten($input)
    {
        $output = array();
        if (is_array($input)) 
        {
            foreach ($input as $element)
            {
                $output = array_merge($output, self::_array_flatten($element));
            }
        }
        else
        {
            $output[] = $input;
        }
        return $output;
    }
}

class WordNGramTests
{   
    static function assert($result, $expected, $tag)
    {
        if ($result == $expected)
        {
            echo "<p>Passed $tag</p>";
        }
        else
        {
            echo "<p><b>Failed $tag</b></p>";
        }
    }
    
    static function run()
    {
        $unigrams = WordNGram::get_unigrams('a b c d');
        $digrams = WordNGram::get_ngrams($unigrams, $unigrams, 2);
        self::assert($digrams[0], array('a', 'b'), 'digrams1');
        self::assert($digrams[1], array('a', 'c'), 'digrams2');
        self::assert($digrams[2], array('a', 'd'), 'digrams3');
        self::assert($digrams[3], array('b', 'c'), 'digrams4');
        self::assert($digrams[4], array('b', 'd'), 'digrams5');
        self::assert($digrams[5], array('c', 'd'), 'digrams6');
        
        $trigrams = WordNGram::get_ngrams($unigrams, $digrams, 3);
        self::assert($trigrams[0], array('a', 'b', 'c'), 'trigrams1');
        self::assert($trigrams[1], array('a', 'b', 'd'), 'trigrams2');
        self::assert($trigrams[2], array('a', 'c', 'd'), 'trigrams3');
        self::assert($trigrams[3], array('b', 'c', 'd'), 'trigrams4');
        
        $tetragrams = WordNGram::get_ngrams($unigrams, $trigrams, 4);
        self::assert($tetragrams[0], array('a', 'b', 'c', 'd'), 'tetragrams1');
        
        $unigrams = WordNGram::get_unigrams('hello, slim shady');
        self::assert($unigrams[0], 'hello', 'unigramsSentences1');
        self::assert($unigrams[1], 'slim', 'unigramsSentences2');
        self::assert($unigrams[2], 'shady', 'unigramsSentences3');
        
        $digrams = WordNGram::get_ngrams($unigrams, $unigrams, 2);
        self::assert($digrams[0], array('hello', 'slim'), 'digramsSentences1');
        self::assert($digrams[1], array('hello', 'shady'), 'digramsSentences1');
        self::assert($digrams[2], array('slim', 'shady'), 'digramsSentences1');

        $unigrams = WordNGram::get_unigrams('my name is, slim shady');
        self::assert($unigrams[0], 'my', 'unigramsSentences21');
        self::assert($unigrams[1], 'name', 'unigramsSentences22');
        self::assert($unigrams[2], 'is', 'unigramsSentences23');
        self::assert($unigrams[3], 'slim', 'unigramsSentences24');
        self::assert($unigrams[4], 'shady', 'unigramsSentences25');

        $digrams = WordNGram::get_ngrams($unigrams, $unigrams, 2);
        self::assert($digrams[0], array('my', 'name'), 'digramsSentences21');
        self::assert($digrams[1], array('my', 'is'), 'digramsSentences22');
        self::assert($digrams[2], array('my', 'slim'), 'digramsSentences23');
        self::assert($digrams[3], array('my', 'shady'), 'digramsSentences24');
        self::assert($digrams[4], array('name', 'is'), 'digramsSentences25');
        self::assert($digrams[5], array('name', 'slim'), 'digramsSentences26');
        self::assert($digrams[6], array('name', 'shady'), 'digramsSentences27');
        self::assert($digrams[7], array('is', 'slim'), 'digramsSentences28');
        self::assert($digrams[8], array('is', 'shady'), 'digramsSentences29');
        self::assert($digrams[9], array('slim', 'shady'), 'digramsSentences30');

        $trigrams = WordNGram::get_ngrams($unigrams, $digrams, 3);
        self::assert($trigrams[0], array('my', 'name', 'is'), 'trigramsSentences1');
        self::assert($trigrams[1], array('my', 'name', 'slim'), 'trigramsSentences2');
        self::assert($trigrams[2], array('my', 'name', 'shady'), 'trigramsSentences3');
        self::assert($trigrams[3], array('my', 'is', 'slim'), 'trigramsSentences4');
        self::assert($trigrams[4], array('my', 'is', 'shady'), 'trigramsSentences5');
        self::assert($trigrams[5], array('my', 'slim', 'shady'), 'trigramsSentences6');

        $tetragrams = WordNGram::get_ngrams($unigrams, $trigrams, 4);
        self::assert($tetragrams[0], array('my', 'name', 'is', 'slim'), 'tetragramsSentences1');
        self::assert($tetragrams[1], array('my', 'name', 'is', 'shady'), 'tetragramsSentences2');
        self::assert($tetragrams[2], array('my', 'name', 'slim', 'shady'), 'tetragramsSentences3');

        $pentagrams = WordNGram::get_ngrams($unigrams, $tetragrams, 4);
        self::assert($pentagrams[0], array('my', 'name', 'is', 'slim', 'shady'), 'pentagramsSentences1');
    }
}

WordNGramTests::run();
?>

SVN Local Repository

Disclaimer is that this is not a tutorial about SVN or version control. But still, here’s some very basic introduction about SVN. Apache Subversion, also called SVN, is a software versioning and revision control system, very useful for keeping historical versions of files. Do do this, you need an SVN client and an SVN server. The client is on your computer, with your files, and you sync file changes to an SVN server. You can sign up for various SVN services online and you can even set up your own SVN server. However, if you want to keep track of changes to your files without signing up to some online SVN service, or going through some complicated process of setting up a server, here’s a few lines that can help you set up a local repository on your computer. Do the following after you’ve installed an SVN client.

  1. Create a folder somewhere that will serve as the base of all your repositories, e.g. mkdir /home/admin/repos
  2. Use the command line/terminal and navigate to one level above the base folder, i.e. cd /home/admin/
  3. Run the following command svnadmin create repo-name, where repo-name is the name of the repository you want to create
  4. Next, note the folder where the files you want to version are located, e.g. /home/admin/Desktop/myproject
  5. Run svn checkout file:///home/admin/repos/repo-name  /home/admin/Desktop/myproject
  6. Now the repository is ready to receive files and track changes
  7. To add files to track, use svn add /home/admin/Desktop/myproject/file-name
  8. For the final step, you need to be in the directory with the files, i.e. cd /home/admin/Desktop/myproject
  9. Run svn commit -m “Add some comment here”
  10. That’s all folks! Now you can roll back changes if you have some problems, and also be able to see the history of changes to your file, very useful for code or even your resume/CV. The command to see the files your repository is svn ls file:///home/admin/repos/repo-name. Reverting/rolling back to older versions and other advanced topics can be explored & learnt at http://subversion.apache.org/

Assumptions (change appropriately)

  • The folder with your files is in /home/admin/Desktop and the name of the folder is myproject
  • The name of the repository you want to create is repo-name
  • The file whose history you want to track is called file-name
  • This is more of a self-reference post so I can remember the sequence of commands to use for creating my SVN local repositories

PHP Web Application Development Part 1

This is a collection of notes from a professional development series I am doing on web application development using PHP. Even though I am using PHP, it’s just one language, and many of the concepts are implemented in other languages. The focus is on the concepts behind web application development.

Part 1 – Building a Weibo Isotope with CodeIgniter

You will need to download these to get ready for the hands-on parts of the seminar (no need to install yet):

It’s advisable to skim through the following reading:

Next steps that we will cover at the seminar (coming this Saturday):

  • Setting up server environment
  • Setting up CodeIgniter
  • Integrating 960 Grid System
  • Integrating Smarty
  • Integrating RedBeanPHP

To be continued…

Trojan:Win32/Sirefef

Epilogue

This infection occurred in a friend’s laptop, running 32-bit Windows 7 Ultimate and Avast! Antivirus free edition. Research on the Internet says this is a very severe rootkit trojan, hard to remove, and includes keyloggers plus other spyware that can steal banking login information. Thanks to various forums, I attempted to remove this trojan and below are some details of how I did it.

Symptoms

  • Hotmail reported the account password was wrong, and on one occasion reported that the account had been deleted and doesn’t exist anymore. However, when trying from another non-infected computer, login was successful.
  • From what I could tell, Internet Explorer had been hijacked and all of the search results were turning up bogus website URLs. For example, clicking on a valid Microsoft Support link showing in Google Search redirected to a bogus website.
  • Windows Firewall reported an error with its settings but turning the firewall on or off was disabled. The firewall service couldn’t be restarted either.
  • Ammyy Admin was running and could not be stopped via the Task Manager, and its executable file (named AA_v3[1].exe) could also not be located in the file system for deletion . Ammyy was also installed as a service. Ammyy is a remote access program, not a virus itself, but probably being used to gain remote access to the system.
  • Avast did not report any infection and was partly disabled. Attempts at enabling it and scanning were not successful.
  • Microsoft Security Essentials (MSE) reported a severe infection at initial scanning and identified it as the Win32/Sirefef trojan.

Removal

Given the severity of the trojan, and that my friend didn’t have the option to reinstall the operating system, I basically used a lot of tools to check, double-check, clean, and double-clean the system. I got to know about these tools from various forums. Below is the list of tools I downloaded, installed and ran, in order of sequence of execution:

Repair

After completing the removal process, I used the following tools to repair system alterations and damages to files caused by the trojan.

  • ESET ServicesRepair Tool: This tool replaces various system files that may have been infected by the trojan.
  • RogueKiller: This tool is useful to repair various registry, hosts files, proxy, MBR, and driver corruptions caused by the trojan.

Prologue

After all that, was the trojan removed? It seems so, the symptoms are gone and the antivirus and antimalware tools haven’t sounded any more alarms of an infection. In this case, I had to try cleaning the trojan, but the best advice is to try to do a fresh install of the system. Also, I ran ESETSirefefremover at the end and it confirmed that the Win32:Sirefef trojan and its variants were not detected on the system.

Note The ESET tools are linked to their executables for clarity, but here is the page where I got the executable links

Finding stuff on the Unix terminal

On Linux or other Unix-based systems, the terminal can be a powerful interface to locating and searching for stuff on your hard drive. The visual interface and search box may have limited options. Ever tried looking for files with some text in them? Here’s two commands that I’ve found very useful on the terminal, and I actually don’t use the search box interface anymore.

To find files by their names or wild cards, use:

find -iwholename “*name*.ext”

The quotes are where you can put in what you want to look for. This command will look in the current directory for files that end with “.ext” and contain “name” somewhere in their name. The –iwholename is just one of the many options in this command, and the -i part ensures the search is not case sensitive. For more info on the find command, here’s a link to the online man pages: http://bit.ly/qNtN93

To find files by searching what’s inside them, use:

grep -ilr “sometext” ./

I am assuming these are text files, but there’s more complex options for searching binary files too. Have a look at the online man pages of the grep command for more options: http://bit.ly/p53tsq The -lir options together give us a nice and tidy output. The -l part prints just the file names in which “sometext” was found, while the -i part makes searching case insensitive, and the -r part allows looking at sub-directories. The final “./” implies the current directory as the starting point for searching.

Notes to self 

  • This is tested on my Debian virtual OS, does it work on other distros?
  • How do I make find search in sub-directories?
  • Can I make find and grep work together (i.e. piping, etc) so I can search for only file types that have a certain keyword?

Open Microsoft Access Files

This may seem like a weird title! After all, what’s the big deal with opening .mdb files? However, there’s some of us out there (me inclusive) that don’t use Microsoft Office. Instead, we rely on free stuff like OpenOffice. So I found a nice utility called “MDB Browser and Editor” that lets you open your legacy .mdb databases 🙂 Its free and I gave it a try, works pretty well. So check it out if you need to.

PS – Actually, OpenOffice does have a way to import existing Microsoft Access files into its database format. However, I couldn’t get this to work with my old password-protected .mdb database.

Useful Registry Locations

I’ve come across these keys by searching on the Internet. I use these keys for detecting viruses, troubleshooting stuff, and other geeky things. The convention I’m using in writing out the keys is MainKey\SubKey1\SubKey2\SubKeyEtc > Value (I’m not stating the data)

“Windows Shell”=”My Computer\\HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Winlogon”
“Run on Startup”=”My Computer\\HKEY_CURRENT_USER\\Software\\Microsoft\\Windows\\CurrentVersion\\Run”
“Run on Startup 2″=”My Computer\\HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run”
“Session Manager”=”My Computer\\HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Session Manager”
“Default Wallpaper”=”My Computer\\HKEY_USERS\\.DEFAULT\\Control Panel\\Desktop”
“IIS Ports”=”My Computer\\HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\ServiceProvider\\ServiceTypes\\w3svc”
“Show Hidden Files”=”My Computer\\HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Explorer\\Advanced\\Folder\\Hidden\\SHOWALL”
“Shell Context Menu”=”My Computer\\HKEY_CLASSES_ROOT\\Directory\\shell”
“My Computer Context Menu”=”My Computer\\HKEY_LOCAL_MACHINE\\SOFTWARE\\Classes\\CLSID\\{20D04FE0-3AEA-1069-A2D8-08002B30309D}\\shell”
“File Context Menu”=”My Computer\\HKEY_CLASSES_ROOT\\*\\shell”
“System Services”=”My Computer\\HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services”
“Registry Favorites”=”My Computer\\HKEY_CURRENT_USER\\Software\\Microsoft\\Windows\\CurrentVersion\\Applets\\Regedit\\Favorites”

1. Windows Shell – This is the place where the OS sets its shell “explorer.exe”. Viruses often target this and inject other files to start up

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon > Shell

2. Run on Startup – The place programs use to start with the OS. There are two places in the registry that hold this, one for the system generally, and one specific to the logged in user.

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Run

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run

3. Default Wallpaper – Sets the default wallpaper of your desktop (the one that shows when no one is logged in yet)

HKEY_USERS\.DEFAULT\Control Panel\Desktop

4. My Computer Context Menu – Things that show up when you right-click the My Computer icon on the desktop (not sure if the CLSID number value will be the same)

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{20D04FE0-3AEA-1069-A2D8-08002B30309D}\shell

5. System Services – Listing of all system services. Useful for removing them manually

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services

Copy Path

Copy Path Shell Extension by Vertigo Software is one of those software that I would call essential. It lets you get the path or directory location of any file on your Windows system by right-cliclking on the file. Here’s the site link.

In addition, if you need to copy UNIX-like paths, you can try out ClipPath, another super software, with similar functionality of appearing in your context menu. I think these kinds of utilities should be in-built in modern operating systems. Windows 7 guys, are you listening?!

Cool AckerPack

Another good software for moi! Great for packaging presentations with dependant files into one executable. Works like a compression software, but has the ability to set a default file to launch from the packed file.

I found this so useful and necessary just a while ago. There was a presentation I had to provide, and it had an additional file with it with a link on the slide. So I needed to provide everything in one place. Zipping was an option, but I also needed to provide a simple way for the user to run the presentation. Unzipping the file, then selecting the presentation file (and remembering its name) wasn’t a failsafe plan. Much better if the user could just click on one file and everything starts working. And AckerPack did just that!

Here’s the author’s description:

AckerPack instantly compresses any folder into a self-extracting executable! Unlike old ZIP-based tools such as WinZIP, with AckerPack you choose where the files should be unpacked and which compressed file to open after installation. Because you have complete control over the process, AckerPack makes an ideal tool for building eBooks or simple software installations.

Believe it or not, packaging up an entire folder for delivery over the internet only takes three clicks!! Just right-click on any folder and select AckerPack Folder. AckerPack compresses up to 30% better than WinZip and produces a much smarter executable which doesn’t confuse the end-user.

Unfortunately, the developer site has been down for some time now, but here’s the URL just in case: AvatarSoft.com. Here’s an alternate download for it: Softpedia.com

Thanks to Softpedia for keeping a mirror of AvatarSoft 🙂

Wax Movie Editor

This post is more of a memory note for me than anything else. I often find cool and free software, but don’t have any use for them at that moment. Later on when I do need them, I’ve already forgotten their names or the URL to get them from. So this is my series on free software.

I gave Wax Movie Editor a try and it certainly looks good. Sort of like the familiar Windows Movie Maker, but with extra functions. Here’s the link: http://www.debugmode.com/wax/