Big Data

Big data can be explained by understanding the following key aspects:

Big Data
Big Data
  1. Volume. There is no specific quantification that says volume above these many terabytes will be called Big data. What volume one considers as threshold depends on the perspective and the year we are in (big data is a moving target). However, this large volume of data is mostly the cost-free byproduct of digital interaction such as consumers buying stuff off online shops.
  2. Velocity. The speed at which data is generated and processed. Big data is often available in real time.
  3. Variety. The type and nature of data. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.
  4. Data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information. Big data is not about asking why, but about detecting patterns. Information generation algorithms must detect and address visible and invisible issues and factors.
  5. Parallel computing tools are needed to handle data.
  6. Often “inductive statistics” are used to infer laws (relationships, causal effects) from large sets of data to reveal relationships or dependencies or to perform predictions of outcomes and behaviours.
Share

Artificial Intelligence

Artificial Intelligence
Artificial Intelligence

Artificial intelligence deals with mimicking the way the human brain works or evolution of life and other such natural phenomena. Here are some of the artificial intelligence techniques:

  1. Artificial Neural Networks
  2. Fuzzy logic
  3. Genetic algorithms
  4. Cellular automata

Artificial Neural Networks are inspired by the way the brain works. A neural network consists of a network of nodes. Each node is capable of making a simplistic decision or a simple calculation on the inputs, and providing an output. By interconnecting a large number of such nodes, it is possible to do data analysis and complex decision making. Each node has a threshold assigned and each connection between nodes is assigned a weight. A random NN is constructed to begin with. This is then “trained” by providing inputs, and matching given outputs against pre-calculated known outputs. When a mismatch is detected between current output and favoured output, the weights and thresholds are suitably modified.

Genetic algorithms are based on the process of evolution and natural selection. To begin with a pool of random “algorithms” is built. Each such algorithm is tested with given input and required output. Those algorithms that give results closest to the desired outcome are selected. Thereafter the next generation is built by combining pairs of algorithms from the previous generation and adding more steps (random mutation). This generation is again tested for fitness. This process is repeated as many times as needed. The algorithms keep increasing in complexity with each generation.

Please refer to Cellular automaton for more information on cellular automata and Fuzzy logic for more information on Fuzzy Logic.

Share

United States: Protest against changes to Rule 41

We stand in protest on June 21st 2016, against proposed changes to Rule 41. We are with EFF on this.

Rule 41 Protest
Rule 41 Protest



Share

Updating Sony Xperia Z2 to Android Marshmallow

Having waited for several days for the update to be pushed to my phone, I temporarily moved over from Airtel to Vodafone. Wow! the update was pushed within seconds – after I connected to the PC using the Xperia Companion software. Airtel seems to be helplessly slow in pushing this update (or perhaps it decided not to go ahead with this one).

Bye Bye Lollipop… 🙂

(India)

Android Marshmallow
Android Marshmallow
Share

Oracle performance tuning – an update

A tutorial is now available to perform Oracle performance tuning of applications and SQL statements. This tutorial has been expanded to include case studies, which will go a long way in better understanding of concepts explained.

Link to the PDF tutorial here: Tuning.pdf

This current posting is an extension to the existing post on performance tuning, which you can still refer to – for more resources on the topic.

Tuning
Tuning

Share

Oracle: Snapshot too old?

Berlin Wall
Berlin Wall
Okay, so you have received the Oracle error ORA-01555 Snapshot Too Old and have no clue how to go about resolving it? This post is made for you then. (The first time an application developer has written about this rather than a DBA.)

First, why does this occur? When you run a query, Oracle retains that data in a “snapshot”. The underlying tables in that query might continue to get changed, but you will see the data as it was when you executed the query. You can keep moving back and forth (rows) within the snapshot using the cursor. However, as you might expect: Oracle cannot hold that snapshot for ever. For how long it retains the snapshot is defined via the UNDO_RETENTION parameter.

So one way to solve this problem might be to increase the limit defined by this parameter. However, that is not always the best solution.

This problem normally occurs when a process opens a cursor (by running the query), and processes each row one by one. For example, let’s assume the process runs a query that returns 10000 rows. Processing each row takes, on average, 10 seconds. It goes on to the next row after processing the previous. Hence the total processing of all these rows will take around 28 hours. If your UNDO_RETENTION is defined as 10 hours, this process will fail on the snapshot too old error.

One of the best ways to solve this problem is to execute performance tuning on this process. This should be carried out specifically on the part of the processes that runs within the query in question, and should be targeted at reducing the time it takes to process one row. For example, if we can get our processing time down to 3 seconds, we will be done within about 8.5 hours, which is below our current setting for UNDO_RETENTION. In most cases, this can actually be done. (Read more here and here.)

A second way to solve the problem is to use a temporary table. For example, suppose you want to analyse all open purchase orders. From the table containing POs, pull the ones that are open, and put them into the temporary table. Since the temporary table is being used only by your process, Oracle will not have to hold the “snapshot” for you. Again the main driver query is the candidate for putting into temporary table. This will also make your process faster overall if it’s based on a not-so-small subset of a large table.

However, a third solution is also possible. For our problem we had a process that had to run for days and days, rather than doing something and finishing. So obviously, we got this error.

To solve the problem, we exited the loop after every n rows, and then reentered it. For example, if the pseudocode looked as below prior to the fix:


select something from somewhere;
while (rows) {
  do process
} 

We changed it as below:


hasAtleastOneRow = True;
while (hasAtleastOneRow) {
  hasAtleastOneRow = False;
  select something from somewhere where rownum<n;
  while (rows) {
    do process
    hasAtleastOneRow = True;
  }
} 

Note that the SELECT statement must have a mechanism to prevent picking up rows that have already been processed earlier. This could be a flag-condition or ‘check’ another table. For example:

select po_id from po_table where po_status='O' and rownum<100
and not exists(select 1 from po_temp where po_table.po_id = po_temp.po_id)

As part of the 'do process' then, we should insert into po_temp.

How do we select the value of 'n'? You will have to do some timing and hit-and-try here. Try to keep the highest value that is guaranteed to take lower processing time compared to the undo retention window.

Share