Big Data

Big data can be explained by understanding the following key aspects:

Big Data
Big Data
  1. Volume. There is no specific quantification that says volume above these many terabytes will be called Big data. What volume one considers as threshold depends on the perspective and the year we are in (big data is a moving target). However, this large volume of data is mostly the cost-free byproduct of digital interaction such as consumers buying stuff off online shops.
  2. Velocity. The speed at which data is generated and processed. Big data is often available in real time.
  3. Variety. The type and nature of data. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.
  4. Data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information. Big data is not about asking why, but about detecting patterns. Information generation algorithms must detect and address visible and invisible issues and factors.
  5. Parallel computing tools are needed to handle data.
  6. Often “inductive statistics” are used to infer laws (relationships, causal effects) from large sets of data to reveal relationships or dependencies or to perform predictions of outcomes and behaviours.
Share

Artificial Intelligence

Artificial Intelligence
Artificial Intelligence

Artificial intelligence deals with mimicking the way the human brain works or evolution of life and other such natural phenomena. Here are some of the artificial intelligence techniques:

  1. Artificial Neural Networks
  2. Fuzzy logic
  3. Genetic algorithms
  4. Cellular automata

Artificial Neural Networks are inspired by the way the brain works. A neural network consists of a network of nodes. Each node is capable of making a simplistic decision or a simple calculation on the inputs, and providing an output. By interconnecting a large number of such nodes, it is possible to do data analysis and complex decision making. Each node has a threshold assigned and each connection between nodes is assigned a weight. A random NN is constructed to begin with. This is then “trained” by providing inputs, and matching given outputs against pre-calculated known outputs. When a mismatch is detected between current output and favoured output, the weights and thresholds are suitably modified.

Genetic algorithms are based on the process of evolution and natural selection. To begin with a pool of random “algorithms” is built. Each such algorithm is tested with given input and required output. Those algorithms that give results closest to the desired outcome are selected. Thereafter the next generation is built by combining pairs of algorithms from the previous generation and adding more steps (random mutation). This generation is again tested for fitness. This process is repeated as many times as needed. The algorithms keep increasing in complexity with each generation.

Please refer to Cellular automaton for more information on cellular automata and Fuzzy logic for more information on Fuzzy Logic.

Share

United States: Protest against changes to Rule 41

We stand in protest on June 21st 2016, against proposed changes to Rule 41. We are with EFF on this.

Rule 41 Protest
Rule 41 Protest



Share

Updating Sony Xperia Z2 to Android Marshmallow

Having waited for several days for the update to be pushed to my phone, I temporarily moved over from Airtel to Vodafone. Wow! the update was pushed within seconds – after I connected to the PC using the Xperia Companion software. Airtel seems to be helplessly slow in pushing this update (or perhaps it decided not to go ahead with this one).

Bye Bye Lollipop… ๐Ÿ™‚

(India)

Android Marshmallow
Android Marshmallow
Share

Oracle performance tuning – an update

A tutorial is now available to perform Oracle performance tuning of applications and SQL statements. This tutorial has been expanded to include case studies, which will go a long way in better understanding of concepts explained.

Link to the PDF tutorial here: Tuning.pdf

This current posting is an extension to the existing post on performance tuning, which you can still refer to – for more resources on the topic.

Tuning
Tuning

Share

Oracle: Snapshot too old?

Berlin Wall
Berlin Wall
Okay, so you have received the Oracle error ORA-01555 Snapshot Too Old and have no clue how to go about resolving it? This post is made for you then. (The first time an application developer has written about this rather than a DBA.)

First, why does this occur? When you run a query, Oracle retains that data in a “snapshot”. The underlying tables in that query might continue to get changed, but you will see the data as it was when you executed the query. You can keep moving back and forth (rows) within the snapshot using the cursor. However, as you might expect: Oracle cannot hold that snapshot for ever. For how long it retains the snapshot is defined via the UNDO_RETENTION parameter.

So one way to solve this problem might be to increase the limit defined by this parameter. However, that is not always the best solution.

This problem normally occurs when a process opens a cursor (by running the query), and processes each row one by one. For example, let’s assume the process runs a query that returns 10000 rows. Processing each row takes, on average, 10 seconds. It goes on to the next row after processing the previous. Hence the total processing of all these rows will take around 28 hours. If your UNDO_RETENTION is defined as 10 hours, this process will fail on the snapshot too old error.

One of the best ways to solve this problem is to execute performance tuning on this process. This should be carried out specifically on the part of the processes that runs within the query in question, and should be targeted at reducing the time it takes to process one row. For example, if we can get our processing time down to 3 seconds, we will be done within about 8.5 hours, which is below our current setting for UNDO_RETENTION. In most cases, this can actually be done. (Read more here and here.)

A second way to solve the problem is to use a temporary table. For example, suppose you want to analyse all open purchase orders. From the table containing POs, pull the ones that are open, and put them into the temporary table. Since the temporary table is being used only by your process, Oracle will not have to hold the “snapshot” for you. Again the main driver query is the candidate for putting into temporary table. This will also make your process faster overall if it’s based on a not-so-small subset of a large table.

However, a third solution is also possible. For our problem we had a process that had to run for days and days, rather than doing something and finishing. So obviously, we got this error.

To solve the problem, we exited the loop after every n rows, and then reentered it. For example, if the pseudocode looked as below prior to the fix:


select something from somewhere;
while (rows) {
  do process
} 

We changed it as below:


hasAtleastOneRow = True;
while (hasAtleastOneRow) {
  hasAtleastOneRow = False;
  select something from somewhere where rownum<n;
  while (rows) {
    do process
    hasAtleastOneRow = True;
  }
} 

Note that the SELECT statement must have a mechanism to prevent picking up rows that have already been processed earlier. This could be a flag-condition or ‘check’ another table. For example:

select po_id from po_table where po_status='O' and rownum<100
and not exists(select 1 from po_temp where po_table.po_id = po_temp.po_id)

As part of the 'do process' then, we should insert into po_temp.

How do we select the value of 'n'? You will have to do some timing and hit-and-try here. Try to keep the highest value that is guaranteed to take lower processing time compared to the undo retention window.

Share

Releasing: Gurbani search for mobile

Search
Search

Gurbani searching on the mobile used to be tough. Symbian and Android phones do not support Unicode with Indic support, which is required to use gurmukhi websites. Opera Mini allowed us to read gurbani, but entering search text is a different matter.

I have developed a web page that allows you to do this, but rather by entering roman letters. For example, enter ‘k’ instead of ‘เจ•’.

To use this go to: http://tinyurl.com/gfind.

GFind
GFind

Mobile users can use this within Opera Mini (with a configuration setting change as shown here: enter config: as a URL which takes you to advanced settings page. Set Use bitmap fonts for complex scripts to Yes and then Save), for best results. Opera Mini uses cloud computing to render Gurmukhi text, and therefore doesn’t need that support on the mobile itself.

That being said, this tool can be used anywhere you like – on the desktop or any mobile browser. Detailed instructions for using it are on the page itself.

This is envisaged to be updated in future to allow search options powered by various other websites, as and when more powerful search is available. Click here for a short history of Gurbani search.

Share

Losing and getting back WordPress comments

Lost!
Lost!
This sleepy Saturday afternoon I logged into my blog, saw there were two comments awaiting approval, noted both were SPAM. So, I marked them that – and purged them permanently. There was a warning sign that I should have made note of, but did not until after the permanent deletion. Instead of two, 20 comments got deleted – meaning along with 2 SPAM ones, 18 meaningful ones got deleted as well.

BAM!!! I value each comment, as they sometimes add significant value to the content of the post. I was lost – what do I do?

First thought was to check my backup – I maintain a backup of my blog, however in this case the backup was more than 6 months old while these comments were less than a month old.

Not losing heart, I realised my hosting provider might have a backup. However, the provider is based out of US and had not opened yet. I left them a message. Reading the FAQ I thought I might have a chance if I can get in touch with them quickly. They only maintained a copy of the most current data, hence contacting them sooner might save the day. That was not to be: they started work two hours later and then got back to me saying that the backup was about 30 minutes old – which is after the comments were deleted. Not good. I was in despair now.

It was time to go to sleep now, and there was barely anything more that I could have done. Next morning I woke up still with a feeling of sadness, and then I had a brainwave. I get an email whenever someone posts a message to my blog. This email contains the message posted. I could check my email box, search for these messages and repost them myself!!!

A quick search on my Nokia N97 revealed I still had those emails. Paradise regained!!!

Share

7 habits of highly effective programmers

Programmers
Programmers? Nah

I recently came across a list of seven rules for beginning programmers. I could agree with only one of the rules – each procedure should have a purpose, an input and a defined output. However, programming is an ecosystem of related disciplines, and the rules ought to control not just coding. Here is my attempt:

1. A successful program is one which meets the customers’ requirements, is flexible (well designed) and requires least effort from the programmer. Such a successful program has three ingredients: Plan, Plan and Plan. If you intend to spend an hour doing the coding, spend 20 minutes up front to plan the design.

2. Procedures should not be created just because it’s written about as a good programming practice. Each procedure should have a need, an objective and a clear input with a defined output. Procedures should not modify global variables, to the extent possible.

3. Unless you plan to complete the program in one sitting, or if you think you might to tweak or debug it later (which is true in most cases), start maintaining notes on the highlights of the design as early as possible. I normally add comments within the code mentioning future enhancements or improvements to the coding design to be done as phase II.

4. Assume that your program will need debugging, and enable that while writing the code itself. Create a debug flag, and emit verbose details when that flag is enabled. This one is most difficult to implement with beginner programmers.

5. When I need to develop code for a new requirement, in the ‘first draft’ I may write only the pseudo code for certain sections where I will need to check language features, or I am not sure of the syntax. Later, I grapple with completing the code. This allows me to focus on the overall algorithm during the first draft, and saves overall time debugging.

6. Simple coding of an efficient design will help, not the other way around. Even in the first draft one needs to focus on the efficiency of the code, only from an algorithm perspective. For example, if I need to extract data from a database, I will ensure I extract only the minimum number of rows I need. I will ignore the fact that the loop needs to do an extra iteration.

7. There are many more rules, and yet an effective programmer is one who knows when to break the rules. Coding is beyond all rules, like a poem. Quoting from The Tao of Programming by Geoffrey James:

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity.

A program should follow the `Law of Least Astonishment’. What is this law? It is simply that the program should always respond to the user in the way that astonishes him least.

Happy coding, please add comments if you can think of rules more important than the ones I have stated. I will too.

Share

Leaving your wi-fi network open

Wi-fi network
Wi-fi network

Bruce Schneier (a leading US computer security expert) and Electronic Frontier Foundation (EFY) advise leaving your wi-fi network open: meaning, not use encryption protocols such as WEP or WPA2. This allows neighbours and passers-by to use it while in urgent need, and increases societal cooperation. EFY states:

“If you sometimes find yourself needing an open wireless network in order to check your email from a car, a street corner, or a park, you may have noticed that they’re getting harder to find.”

Due to privacy concerns, and also to avoid letting terrorists and anti-national elements use the spectrum – people are closing down their wi-fi networks. This is also the official advice from my ISP.

EFY argues however, that allowing access to others is just another way of giving back to the society. In addition they argue that this allows more efficient use of spectrum compared to cell phone towers. It admits, however, that current protocols are not designed for efficient sharing. The ideal protocol, as per EFY, would allow sharing part of your bandwidth – while leaving the rest encrypted and closed for snooping. They are working on building such a protocol.

I would love to leave my network open – I do not use all my bandwidth, and in fact do not use it at all for several hours a day. I have an unlimited plan – so it would not be a financial burden. It would instead shift the burden to the ISPs, which I believe is fair – they have restricted trade practices too.

Given the current state of terrorism in India – however – I do not feel safe in doing so. America has understood terrorism only a decade ago – we have felt it for last several decades. We know that an open wi-fi was used to claim Mumbai attacks of 26/11. I am also not sure of the legal protection in India, if any. The government machinery works in an ad hoc basis – even though we may claim to be the world’s largest democracy.

Please post thoughts / comments.

Share

Preparing for the PMP

PMP
PMP
This post is based on an interview with Piyush Singhal who cleared PMP recently with a 90+ score.

Okay, so you are thinking about going for the Project Management Practitioner exam, and do not know where to start. Let’s get you started. Below is a project plan for clearing the PMP certification.

The first thing to do, is to obtain PMI membership. This might get you the PMBOK guide (Project Management Body of Knowledge) bundled. In addition, you get a discount on the exam fee and access to all PMI resources online for one year. You may also join a local chapter of PMI.

When he set out on operation PMP, he realised it would not be possible for him to read the entire PMBOK. So, he decided to use audio books for this purpose, and installed them onto his car audio that he could listen to while travelling. These are available from pmprepcast.com. He spent about six months listening to these, around 5 hours a week.

At this time, he tried to assess himself. There are PMP-like sample tests available on various websites. Working through multiple tests in simulation mode gives you an idea of how you are placed against the real one. You can take these four hour tests, always keeping the formula cheat sheet on hand.

For the last lap, he accepted three week solitary confinement. Leave from office, away from family, away from TV: reading about eight to ten hours a day. Then he decided he had had enough, woke up one morning – and appeared for the test. Sounds easy, doesn’t it?

Share

Multiple internet connections – single PC

Connected!
Connected!

Many of us have more than one internet connections these days. I have a wired broadband connection (slower but unlimited), and a 3G connection on my mobile (faster but limited to 2GB) – which can be used on the PC via Nokia wifi tethering (Joikuspot).

I want my PC to use both at the same time, and have different applications use different modes of connecting. For example, Lotus Notes (or Outlook), and SSH – these I want to connect using the unlimited wired connection; but Firefox – which I use the most and frustrates me if its slow – should be on the faster 3G connection. Your specification might be different, but if you need to use two different connections, read on.

I use software called ForceBindIP for this. The step by step procedure:

  1. Connect your PC to both the connections. I connect one via Ethernet port and the other via wifi. Since I want all applications (except one) to use the Ethernet, I connect that one first, and start the applications which I want to use that with. Second, I connect to the wifi one.
  2. Open the command prompt (Go to Start->Run, and type cmd).
  3. Type ipconfig and press enter.
  4. You will see some output. We will need to figure out the IP address (four numbers, separated by dots) of the wifi connection. Look under Ethernet adapter Wireless Network Connection and against IP address. For me the number is 192.168.2.2
  5. Find out the path to the application you want to open linked to this new Internet connection. For example, “C:\Program Files\Mozilla Firefox\firefox.exe”
  6. Now, back in the command window, write something like:

    ForceBindIP -i 192.168.2.2 "C:\Program Files\Mozilla Firefox\firefox.exe"

  7. This should open up Firefox.

You are all set. Enjoy! If you face any problems, please do let me know via comments.

Share

Licensing and information about the blog available here.