Artificial Intelligence

Artificial Intelligence
Artificial Intelligence

Artificial intelligence deals with mimicking the way the human brain works or evolution of life and other such natural phenomena. Here are some of the artificial intelligence techniques:

  1. Artificial Neural Networks
  2. Fuzzy logic
  3. Genetic algorithms
  4. Cellular automata

Artificial Neural Networks are inspired by the way the brain works. A neural network consists of a network of nodes. Each node is capable of making a simplistic decision or a simple calculation on the inputs, and providing an output. By interconnecting a large number of such nodes, it is possible to do data analysis and complex decision making. Each node has a threshold assigned and each connection between nodes is assigned a weight. A random NN is constructed to begin with. This is then “trained” by providing inputs, and matching given outputs against pre-calculated known outputs. When a mismatch is detected between current output and favoured output, the weights and thresholds are suitably modified.

Genetic algorithms are based on the process of evolution and natural selection. To begin with a pool of random “algorithms” is built. Each such algorithm is tested with given input and required output. Those algorithms that give results closest to the desired outcome are selected. Thereafter the next generation is built by combining pairs of algorithms from the previous generation and adding more steps (random mutation). This generation is again tested for fitness. This process is repeated as many times as needed. The algorithms keep increasing in complexity with each generation.

Please refer to Cellular automaton for more information on cellular automata and Fuzzy logic for more information on Fuzzy Logic.

Share

Oracle performance tuning – an update

A tutorial is now available to perform Oracle performance tuning of applications and SQL statements. This tutorial has been expanded to include case studies, which will go a long way in better understanding of concepts explained.

Link to the PDF tutorial here: Tuning.pdf

This current posting is an extension to the existing post on performance tuning, which you can still refer to – for more resources on the topic.

Tuning
Tuning

Share

Releasing: Gurbani search for mobile

Search
Search

Gurbani searching on the mobile used to be tough. Symbian and Android phones do not support Unicode with Indic support, which is required to use gurmukhi websites. Opera Mini allowed us to read gurbani, but entering search text is a different matter.

I have developed a web page that allows you to do this, but rather by entering roman letters. For example, enter ‘k’ instead of ‘เจ•’.

To use this go to: http://tinyurl.com/gfind.

GFind
GFind

Mobile users can use this within Opera Mini (with a configuration setting change as shown here: enter config: as a URL which takes you to advanced settings page. Set Use bitmap fonts for complex scripts to Yes and then Save), for best results. Opera Mini uses cloud computing to render Gurmukhi text, and therefore doesn’t need that support on the mobile itself.

That being said, this tool can be used anywhere you like – on the desktop or any mobile browser. Detailed instructions for using it are on the page itself.

This is envisaged to be updated in future to allow search options powered by various other websites, as and when more powerful search is available. Click here for a short history of Gurbani search.

Share

7 habits of highly effective programmers

Programmers
Programmers? Nah

I recently came across a list of seven rules for beginning programmers. I could agree with only one of the rules – each procedure should have a purpose, an input and a defined output. However, programming is an ecosystem of related disciplines, and the rules ought to control not just coding. Here is my attempt:

1. A successful program is one which meets the customers’ requirements, is flexible (well designed) and requires least effort from the programmer. Such a successful program has three ingredients: Plan, Plan and Plan. If you intend to spend an hour doing the coding, spend 20 minutes up front to plan the design.

2. Procedures should not be created just because it’s written about as a good programming practice. Each procedure should have a need, an objective and a clear input with a defined output. Procedures should not modify global variables, to the extent possible.

3. Unless you plan to complete the program in one sitting, or if you think you might to tweak or debug it later (which is true in most cases), start maintaining notes on the highlights of the design as early as possible. I normally add comments within the code mentioning future enhancements or improvements to the coding design to be done as phase II.

4. Assume that your program will need debugging, and enable that while writing the code itself. Create a debug flag, and emit verbose details when that flag is enabled. This one is most difficult to implement with beginner programmers.

5. When I need to develop code for a new requirement, in the ‘first draft’ I may write only the pseudo code for certain sections where I will need to check language features, or I am not sure of the syntax. Later, I grapple with completing the code. This allows me to focus on the overall algorithm during the first draft, and saves overall time debugging.

6. Simple coding of an efficient design will help, not the other way around. Even in the first draft one needs to focus on the efficiency of the code, only from an algorithm perspective. For example, if I need to extract data from a database, I will ensure I extract only the minimum number of rows I need. I will ignore the fact that the loop needs to do an extra iteration.

7. There are many more rules, and yet an effective programmer is one who knows when to break the rules. Coding is beyond all rules, like a poem. Quoting from The Tao of Programming by Geoffrey James:

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity.

A program should follow the `Law of Least Astonishment’. What is this law? It is simply that the program should always respond to the user in the way that astonishes him least.

Happy coding, please add comments if you can think of rules more important than the ones I have stated. I will too.

Share

Preparing for the PMP

PMP
PMP
This post is based on an interview with Piyush Singhal who cleared PMP recently with a 90+ score.

Okay, so you are thinking about going for the Project Management Practitioner exam, and do not know where to start. Let’s get you started. Below is a project plan for clearing the PMP certification.

The first thing to do, is to obtain PMI membership. This might get you the PMBOK guide (Project Management Body of Knowledge) bundled. In addition, you get a discount on the exam fee and access to all PMI resources online for one year. You may also join a local chapter of PMI.

When he set out on operation PMP, he realised it would not be possible for him to read the entire PMBOK. So, he decided to use audio books for this purpose, and installed them onto his car audio that he could listen to while travelling. These are available from pmprepcast.com. He spent about six months listening to these, around 5 hours a week.

At this time, he tried to assess himself. There are PMP-like sample tests available on various websites. Working through multiple tests in simulation mode gives you an idea of how you are placed against the real one. You can take these four hour tests, always keeping the formula cheat sheet on hand.

For the last lap, he accepted three week solitary confinement. Leave from office, away from family, away from TV: reading about eight to ten hours a day. Then he decided he had had enough, woke up one morning – and appeared for the test. Sounds easy, doesn’t it?

Share

Performance tuning tips

Tuning

Today I will share with you a couple of tips on process performance tuning: rewriting your code to be faster.

This, for a change (contrasted with my previous posts on performance tuning), has nothing to do with Oracle or SQL: you can use these tips in any language.

When your code has been identified as having a performance issue, the first task is to go through the code with a fine toothed com from a performance perspective. Is there something that you can immediately notice and change?

After that is done, one should look at caching as one optimizing method. Caching is used in multiple domains, from web-browsing to microprocessor memory and there is no reason why your code should not benefit from it. As an example, an application had some logic to map department IDs: given a source department, it had to determine a target department. However, several different database tables needed to be consulted in a chain mode, some of which allowed ranges (e.g. for input department 1234A to 999B, have the output department as 526C). All this took substantial time, per department input. To solve the problem, we created a cache table: when the process looked up one department, it added that to the cache table (along with the corresponding output department). The next time it encountered the same input department ID, it would just pick up the value from the cache. How long to retain the cache is an important parameter, and your own requirements will need to decide that.

This can also be implemented using a Hash table (or a LoadLookup as some languages call it), rather than using a database table.

The second tip is moving the decision making earlier in the flow. For example, with one of the reports, certain rows were selected for processing in the beginning. Each row was thereafter processed one by one. At that time, part of the logic was to check some row fields and determine that this particular row did not need to be processed. This check was moved upwards, during the time of initial selection. As a result, the overall processing time went down.

Please go apply these two tips in your projects and let me know your feedback.

Share

Recursion: my two cents

A lot has been written on the use of recursion in computer programming, yet it remains one of the least understood aspects – especially for beginners. Having visited the Wikipedia page on recursion, I believe the text is hard to understand, and the examples are forced: there is no reason to use recursion to solve Fibonacci series or calculate factorials.

Recursion means solving a problem by splitting it into smaller problems. If the problem is numerical, then splitting it into smaller numbers.

Consider the problem of creating all permutations of a character in a string. If ‘abc’ is input, the program should show ‘abc’,’acb’,’bac’,… and so on. How can we solve this problem?

Permute

We propose to write a function called ‘permute’ which creates permutations of the string that is passed to it. When I wanted to create this function, the first thing I did is to design the tree. Just doing this showed me a flaw in the initial design that I was planning, and later helped me with the debugging. The tree shows that each letter in the input string is extracted from the string one by one, and the rest of the string is passed again as parameter to the same function – the character removed is added to the prefix. Eventually the ‘string parameter’ becomes a single character at which time the permutation is printed.

Have a look at the code, and relate it to the tree:

#!/usr/bin/perl

print("Please enter a string to permute->");
$s = ;
chomp($s);

permute ($s,length($s),"","");

sub permute {
    my ($s,$l,$pref)=@_;
    my $i;

    if ($l > 1) {
        for ($i = 0; $i < $l; $i++) {
            $ch = substr($s,$i,1);
            $rest = substr($s,0,$i) . substr($s,$i+1,$l);
            permute($rest,$l-1,$pref . $ch);
        }
    }
    else {
        print $pref . $s . $sufx . "\n";
    }
}

As I said the first step in such recursive coding is to identify the tree. When I started out, I made the tree in an incorrect fashion. I split permute(abc) into a.permute(bc) and permute(bc).a. I felt that since the basic idea was to permute, this is how we should do it. However, doing this resulted in only four permutations at the bottom of the tree, instead of six as should be. This made me go back to the drawing board.

I want to end this on a humorous note for people who write recursive solutions to simple problems:

To loop is human, to recur - divine.

Share

Oracle deadlocks: the what and the how

Everyone knows what a deadlock is: a situation in which two or more competing processes are waiting for the other to finish, and thus neither ever does. The purpose of this post is to help people understanding the deadlock a little better with a view to enable them to fix the problem when they find one.

Assume that there are two processes running, A & B and that they require a (shared) file and a printer to do their work. Process A locks up the printer, and Process B locks up the file for its own use. Now, none of the processes can complete because they do not have all the resources needed for their completion, and neither will they release the resource they have: they will keep on waiting for the second resource.

Let us create a deadlock now, using Oracle database and SQL Plus client.

We opened two sessions, and executed “set autocommit off” as the first statement.

Now in the first session we executed:

UPDATE ps_voucher SET grp_ap_id='A' WHERE voucher_id='00692096' AND invoice_dt='2-JAN-2002';

second session:

UPDATE ps_voucher SET grp_ap_id='A' WHERE voucher_id='00692096' AND invoice_dt='13-MAR-2007';

back to the first:

UPDATE ps_voucher SET address_seq_num=2 WHERE voucher_id='00692096';

and then the second:

UPDATE ps_voucher SET address_seq_num=2 WHERE voucher_id='00692096'

BAM! Deadlock. See screenshots:

Deadlock - Session I
Deadlock - Session I
Deadlock - Session II
Deadlock - Session II

What went wrong? There existed two vouchers in the system, with the same VOUCHER_ID but with different INVOICE_DTs (invoice dates). Each process first locked up one of those vouchers, and then – as the second UPDATE – tried to update both. (On the database side, a process gets a lock on a specific row when it UPDATEs that row, and the lock is released when the process COMMITs or ROLLBACKs.)

Yes, the programmer could have been smarter and written better code: if he had put the INVOICE_DT clause in the second statement also we would have been fine. However, in practice, with huge systems having tons of code – programmer will sometimes make mistakes. Even if they do not, deadlocks will occur: not all deadlocks are caused by SQL issues.

From a system design perspective, what can be done to prevent deadlocks? One way is for the execution of each process to have a unique ID – let’s call it process instance (PI). So if a process ABC is run once, it will have a PI of 1222 and when it’s run next it will have a PI of 1224. If, after this process PQR is run, it will have a PI of 1223. Before changing any transactions, the process can update it own PI on the transactions that qualify:

UPDATE ps_voucher
SET pi=1223
WHERE <process specific selection criteria>
AND pi=0;

COMMIT;

The commit here is important – only then will other processes be able to see the ‘locking’.

Thereafter the normal processing SQLs can be changed as below:

UPDATE ps_voucher
SET grp_ap_id='1'
WHERE <process specific criteria>
AND pi=1223;

At the end, set the transactions back to ‘open for processing’ by setting PI to zero:

UPDATE ps_voucher
SET pi=0
WHERE pi=1223;

If there are other ways to achieve this, please let me know by posting comments.

The DBA is usually able to specify the SQL queries involved in a deadlock. Many times one process is UPDATing the rows that the other is DELETing.

Share

Top 10 considerations when preparing a software test plan

testing

Click on images to enlarge

-> Test the parts of the application that have changed since the last cycle / go live

This part of the test plan is very obvious: test the changes to the application. Each change needs to be tested individually if possible, or as groups if the number of changes is large, and is known by the name regression testing.

For example, if you added a new field called ‘maximum pay by date’ to the voucher batch interface, then you could test the interface for this – having both data with this date entered, and with this date set to blank.

There is nothing more to this one – its normally the facet of testing that does receive the due focus during testing.

-> Test sampled parts of the application that have NOT changed

Now we come to something that does NOT receive the due focus. The parts of the application that remained unchanged. No, you do not have to test ALL if it. If you can test all of the application (especially with automated tools, as discussed below) – nothing like it. However, at least test 10-15% of functionality that has not changed.

For example – as discussed above – if you changed the voucher batch interface, then you can test the online voucher entry. Under the online voucher entry, test at least one scenario that has not changed.

The rule of the thumb is that if in a module having 100 test cases, 40 have changed – then test those 40 that have changed, and test 6-10 of those 60 that have not changed.

-> Look at it from the end users perspective: do one full cycle end to end

Next to include in the plan is something you can call integration testing: if your application is about users entering vouchers and getting paid – perform this cycle as a user would do. Many times we IT folks test only our application – the one we are developing and forget the rest of the glue technology. It falls into the category where we want to do it, yet are lazy at – so we find some short-cuts.

Once I was asked to carry out testing for a reconciliation report that had already been tested by the developers. I uploaded the same input twice, which ended up showing double on the final report. It turned out that the developer had missed this because he tested only on the basis of data that already existed in the system, and did not upload any new vouchers.

-> Stress testing

stress testing

Stress testing should again be a very critical part of your test plan. How many users are expected to use the application? during normal hours? during peak hours? Plan for all such scenarios. Design the business process that would take place if the application does fail – the idea should be that the user’s work doesn’t get halted.

There are stress testing tools available both free and commercial that you can use to simulate users.

In one of my projects, a web application that was created for 800 users, failed under a load of 35. Increasing the number of processors, or the number of server boxes is not a guaranteed way of handling load on the application: the application has to be designed to support the load from the ground up, and tested suitably.

-> Performance testing

performance testing

How long does a file take to get processed? How long does the user expect it to take? How long it takes for the screen to open/save?

The user expectation part is sometimes ignored. Please go ask the users of your application now what their expectation is – or it might already be too late in terms of coding.

The developers might think if a process runs for one hour its good enough. However, the users might be needing to run it six times a day during the closing period. Hence one hour might not be fast enough. In such a scenario we had to run four parallel instances of a process to achieve the user specified timing.

-> Concurrency testing

Can two different instances of the new process run together? The panel you just created: can it be used by two persons at the same time? Does it cause deadlocks at the database level if 100 instances of the process are run together?

Can two different versions of the application exist on the same machine?

These are the kind of questions that you ask yourself while working on the ‘concurrency’ aspect of test plan/execution.

A team of developers once needed to clone a process, and create slightly different functionality. However, it turned out that when both the processes were run together, 1 times in 10, one of the processes would fail. This was noted after go live ๐Ÿ™‚ Turned out the cause was incorrect use of the shared temporary tables by one of the processes.

If you are interested in Deadlocks technically please read my posting: “Oracle Deadlocks: the What & the How“.

-> Unit test before Integration testing

Our laziness at work again: we ‘trust’ our work and want to move directly to integration testing. Partially, the waterfall model of software development is also to blame here.

99% of the times, after the developer moves directly to integration testing – the very first test case for the application fails, and the developer comes back to the unit testing phase. ๐Ÿ™‚

Unit testing is a very critical part of your test plan – if you do it right, you will find hundreds of issues that will otherwise never get detected. Even not during integration testing.

Build ‘driver modules’ to iterate through all the ‘ifs and whiles’ that have been coded. Try out all avenues control can flow through.

-> Create test history

Creation of a test history is as important as doing the testing. Being able to, at a later date, answer such questions as: ‘what are cases we tested?’, ‘what are the problems we found?‘ etc is very helpful. Showing a clean slate (a ‘pass’ on all test cases) at the end of all our test iterations is not so helpful. In short, record the problems found, even though they may get corrected later on.

-> Automated testing

Automated testing solutions can be a big help. It does not mean that all testing be delegated to the automated testing mechanism: but it can definitely be an add-on to your manual testing.
In changing the order entry functionality, use it to enter 1000 different orders. There are several solutions available (use google) that will record the user actions, and will repeat those actions later with different data.
At a very simple level, AutoIt is a great tool for automated data entry, and is free (GPL). Its very flexible and has a great library of functions built into its scripting language. I use it all the time, and not just for testing!

-> Code review

While we focus on all these great ways of testing let us not forget our tried and tested workhorse: code review. Being humans, we are tempted to feel that by doing better testing (being easier to do) we can offset the need for a good code review, but there are hundreds of reasons to do code review.
There may be some program flows designed for rare situations which may never get tested. Code review in such a case will contribute ideas for such test cases. Documentation may not be in sync with the code, with the potential to make future changes difficult. There may be code improvements possible: for example, replacing an ‘if condition’ with a more specific check.

There are other things, depending on your scope you may also want to include them:

-> Knowledge transfer/competence testing

-> Backup & recovery testing

All the best, post your comments here.

Share

Scripted thumbnail generation: security perspective

Money gone!
While searching for something on the net, I came across some scripts that generate image thumbnail on the fly.

For example: http://tech.mikelopez.info/2006/03/02/php-image-resize-script/.

While using such scripts we should be aware of the security point of view: your site can easily become a proxy for other people or websites. Continue reading “Scripted thumbnail generation: security perspective”

Share

When NOT to normalise the database

database
When talking of Database Normalisation, textbooks often talk of BCNF, fifth and higher normal forms. However, in practice (in large software/ERPs) I have rarely noticed normalisation beyond Third Normal form. In fact, there is a certain degree of redundancy that is desirable.

While doing database design, I believe there are two critical aspects that should be kept in mind but I see ignored in a lot of common software.

The first is the time aspect of data. First – an example from finance. Consider a company having multicurrency invoicing. The tables can be designed as:

INVOICE: InvoiceID, ..., Currency, BaseCurrency, TransactionDate, ...
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

This is a design having no redundancy. On the basis of the three fields in the INVOICE relation, we can always find out the latest row from the CONVERSIONS table having EffectiveDate less than TransactionDate. Hence we can determine the RateMultiplier.

Consider another design:

INVOICE: InvoiceID, …, Currency, BaseCurrency, TransactionDate, RateMultiplier, …
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

Here, the system determines the value of the RateMultiplier at the time of invoice creation and records it permanently within the INVOICE table itself. To me this would be more mature design. Why? Because a lot of data in the INVOICE table would actually depend on the RateMultiplier: for example the VAT details. If on 1-JAN-2009 we know that the exchange rate is 1.1. However, on 3-JAN-2009 we come to know that the rate was incorrectly recorded. Someone changes the CONVERSIONS table to reflect the new exchange rate, of 1.2. All the details in the INVOICE table for the invoices created between 1-JAN and 3-JAN become inconsistent since the BaseCurrency is now inconsistent with the RateMultiplier.

Now consider an example from HR appraisal systems. A table stores what stage an appraisal process is at for a particular employee. This is then used to decide what access he has.

STAGE_CURRENT: EmpID, Stage

Note that this has no Date, or Year field. An employee is trying to see records for the previous year appraisals, yet is unable to see some of the data, because current appraisal process is still in initial stage.

The next problem is that of storage of “under calculation” fields. For example, consider the training department maintains the scores of each student trained. The test administered is of 100 marks, but has a weightage 40. Proposed design:

SCORES: CandidateID, TestID, Score, Flag

At the time of recording, the Flag is set to N. Thereafter a process runs that multiplies the score by 0.4 and sets the Flag to Y.

In my opinion a better design would be to retain both the scores even though the pre-weightage score is not relevant to the business process, because a process can also terminate in between due to erroneous data being supplied. Hence if the process ends after setting the flag to Y, and before changing the score; or in reverse order: after changing the score and before setting the flag then we end up with inconsistent data. Improved design:

Scores: CandidateID, TestID, Score, WeightedScore

At the time of recording, Score is entered and WeightedScore is set to zero. Thereafter a process runs that multiplies the Score by 0.4 and stores the value in WeightedScore.

The central idea is to retain all information permanently so that even if the process fails, we know what data existed.

Share

Hiding WordPress categories

Photo by John Poyntz Tyler
Photo by John Poyntz Tyler

When I wrote my first WordPress related post, I admitted that I was only doing it to attract traffic and it would be my last post on the subject. However, I start again. This time around, however, I want to talk about something which isn’t common knowledge and neither did I get any responses on the official WordPress forum regarding this.

Suppose you do not want some of your posts to appear anywhere: not the homepage, not the RSS feeds, not the archives: nowhere. However you DO want it to appear only when its linked to, as a single post on the page. I regularly need to do this, because some part of the post is more like an ‘addendum’ or when including everything would make the post too long.

There is a standard solution available on the forums: creating a plugin and adding code to this effect:


function hs_cat_exclude($query)
{
if ($query->is_feed || $query->is_home || $query->is_archive ) {
$hsq = $query->gt;get('cat');
if (!isset($hsq)) {
$hsq = '-22';
}
else {
$hsq = $hsq . ",-22";
}
$query->gt;set('cat',$hsq);
}
return $query;
}

add_filter('pre_get_posts','hs_cat_exclude');

Here, 22 is the category number of the category I wanted to exclude.

This code works fine, but the moment you add is_category to the ‘if‘ clause, it doesn’t work for the category page. This was perplexing to me, and I did not understand it. I spent a long time and then decided to dig deeper. I found out that the ‘wiring’ is faulty (this is what I believe). It can’t work like this for the category page. What is needed additionally is something like this:


function hs_cat_exclude_cat($where)
{
global $wp_query;
if ($wp_query->is_category) {
$where = $where . " AND NOT EXISTS(SELECT 1 FROM wp_term_relationships WHERE wp_term_relationships.object_id=wp_posts.id AND wp_term_relationships.term_taxonomy_id='22')";
}
return $where;
}

add_filter(‘posts_where’,’hs_cat_exclude_cat’);

So far so good. What I wanted over and above this though, is for the category to not even appear on the category widget. I tried to find a way to get this done through the plugin but it did not work. Ultimately I had to ‘hack’ one of the core files to achieve this. If anyone knows of a better way to accomplish this, please add a comment. The change is to wp-includes/widgets.php. Find the line of code that looks like:

$cat_args = array('orderby' => 'name', 'show_count' => $c, 'hierarchical' => $h);

added an ‘exclude’ clause like this:

$cat_args = array('orderby' => 'name', 'show_count' => $c, 'hierarchical' => $h, 'exclude' => 22);

Thats all there is to it.

Update Jan 12th 2010: Since WP 2.8 I believe (I noticed the problem in 2.9.1), the last change above needs to be done to default-widgets.php rather than widgets.php

Share

Licensing and information about the blog available here.