Thursday, June 27, 2013

Introduction to WCF

Windows Communication Foundation (Code named Indigo) is a programming platform and runtime system for building, configuring and deploying network-distributed services. It is the latest service oriented technology; Interoperability is the fundamental characteristics of WCF. It is unified programming model provided in .Net Framework 3.0. WCF is a combined features of Web Service, Remoting, MSMQ and COM+. WCF provides a common platform for all .NET communication.

Below figures shows the different technology combined to form WCF.

Advantage

WCF is interoperable with other services when compared to .Net Remoting,where the client and service have to be .Net.
WCF services provide better reliability and security in compared to ASMX web services.
In WCF, there is no need to make much change in code for implementing the security model and changing the binding. Small changes in the configuration will make your requirements.
WCF has integrated logging mechanism, changing the configuration file settings will provide this functionality. In other technology developer has to write the code.

Disadvantage

Making right design for your requirement is little bit difficult. I will try to help you on solving these difficulties in the following article.

http://wcftutorial.net/Introduction-to-WCF.aspx

Development Tools

WCF application can be developed by the Microsoft Visual Studio. Visual studio is available at different edition. You can use Visual Studio 2008 Expression edition for the development.

http://www.microsoft.com/express/product/default.aspx

Visual Studio 2008 SDK 1.1

http://www.microsoft.com/downloads/details.aspx?FamilyID=59ec6ec3-4273-48a3-ba25-dc925a45584d...

Microsoft Visual Studio 2008

Microsoft Visual studio 2008 provides new features for WCF compared to Visual Studio 2005. These are the new features added to VS 2008.

Multi-targeting
You can create application in different framework like Framework 2.0, 3.0 and 3.5
Default template is available for WCF
WCF - Test Client tools for testing the WCF service.
Microsoft provides inbuilt application to test the WCF application. This can be done by opening the Visual Studio command prompt and type the wcfClient Serviceurl shows below. This will help the developer to test the service before creating the client application.
WCF services can be debugged now in Visual Studio 2008. Wcfsvchost.exe will do it for you because service will be self hosted when you start debugging.

http://wcftutorial.net/Introduction-to-WCF.aspx

Thursday, June 13, 2013

JEE EJB architecture for NGS web applications

Tuesday, June 11, 2013

Ten Questions Before Grad School

Christoph Hogue

I recently ran across a printed copy of this handout I wrote 1994 while I was a graduate student at the University of Ottawa.

It was the final year of my Ph.D and I was acting as a peer counselor for other graduate students. It seemed to me that some incoming students were really ill prepared for the graduate student experience.

So I based the questions on problems I had heard from other students while in this role.

Aside from correcting a few typos, this is exactly what I wrote.
_______________
Ten Questions to Ask Yourself Before Committing Yourself to a Graduate Degree
by Christopher Hogue, Student Representative, Grad. Studies Committee,
Dept. of Biochemistry, U. of Ottawa, 1994.

1)    I would like to do academic research, industrial research, or post-secondary level teaching as my primary career choice. I am prepared to relocate to follow my career throughout my life.

2)    I have had exposure to a research or laboratory environment and I appreciate what it means to spend 2-6 years doing research.

3)    I have a prospective research supervisor who can be described as trustworthy, intelligent, optimistic and thoughtful. I have examined his or her relevant research publications.

4)    I have a good grasp of my proposed thesis research project and I am aware of roughly how many research publications it has the potential of generating in the time allotted. I have familiarized myself with the current relevant research in the field.

5)    I have sought external information about my prospective research supervisor, including the opinion of his or her former graduate students, colleagues at the university, and other respected scientists in the same field of research. These opinions are, in general, favorable, and I have a more complete understanding of my prospective supervisor than from what we may have discussed one-on-one.

6)    I understand the costs of providing for my basic necessities and I know that my source of revenue is sufficient to provide this level of funding.

7)    My source of revenue (e.g. supervisor / grant / teaching income / part time job) is guaranteed for the expected duration of my graduate studies.

8)    I am prepared to spend a large amount of personal time working on the research necessary for a thesis, without additional financial compensation.

9)    My current long-term partner or love interest is also aware of what it means for me to undertake a graduate degree and is willing to provide the support and sacrifice parallel to my own.

10)    I am prepared to sacrifice an amount of extra time (up to two years) in case of a contingency in order to complete my thesis and degree. Such contingencies include failed research objectives, family responsibilities, child rearing, relocation of graduate supervisor, or unexpected loss of financial support.

Each of these questions requires an opinion, and sever of the questions require you to undertake an exercise - usually in information gathering.

If you said YES to all of these questions, by all means begin in earnest!

If you said NO to any one or more of the above questions, you should, at this stage, reconsider your decision or take the steps necessary to change your answer - gather the required information or adjust your opinion.

If, in the end, you answer NO to 3 or more of the above questions, you are ill-suited for the graduate student experience, you will whine throughout your term, and you should consider getting a job in the work-force. This may mean a change in career direction away from your undergraduate experience and expectations - but in the end it will probably be more suitable.

Sunday, June 9, 2013

Short tutorial on Dumbo with MapReduce

https://github.com/klbostee/dumbo/wiki/Short-tutorial

This short tutorial quickly guides you through the most important aspects of Dumbo.

Prerequisites

In the remainder of this tutorial, it is assumed that you successfully completed all installation steps described in Building and installing for a recent Dumbo version. This tutorial also requires you to already be (a bit) familiar with MapReduce, since the goal is to explain Dumbo, not MapReduce.

Simple example: Counting IPs

Suppose that you want to generate a top five of the IPs that occur most frequently in a given Apache access log file access.log. In UNIX, this can be done as follows:

$ cut -d ' ' -f 1 access.log | sort | uniq -c | sort -nr | head -n 5

The following Dumbo program ipcount.py provides an alternatives solution:

def mapper(key, value):
    yield value.split(" ")[0], 1

def reducer(key, values):
    yield key, sum(values)

if __name__ == "__main__":
    import dumbo
    dumbo.run(mapper, reducer, combiner=reducer)

p. When you run this program by executing the commands

$ dumbo start ipcount.py -input access.log -output ipcounts
$ dumbo cat ipcounts | sort -k2,2nr | head -n 5

you actually do something very similar to the UNIX command given above. The mapper and reducer run as separate UNIX processes, and the output from the mapper process is piped through sort before it goes to the reducer process. However, by adding the -hadoop option, this Dumbo program can be run on a whole Hadoop cluster instead of a single UNIX machine, which allows you to generate the top 5 for gigabytes or even terabytes of weblogs. For instance, if Hadoop is installed in /usr/local/hadoop on the machine from which you run your jobs, and your weblogs are put in directories of the form weblogs/<year>/<month>/<day>/ on the HDFS, then you can generate the top 5 for December 2008 as follows:

$ dumbo start ipcount.py -hadoop /usr/local/hadoop -input weblogs/2008/12/* -output ipcounts
$ dumbo cat ipcounts/part* -hadoop /usr/local/hadoop | sort -k2,2nr | head -n 5

When your Hadoop cluster is large enough, this will still work if your website gets millions of page views every day, whereas the corresponding UNIX command would probably not be able to get the job done when the log files are this big.
Note that the path given via the -hadoop option is where Dumbo will look for both the hadoop command (in the bin/ subdirectory) and the Hadoop Streaming jar. Since these might not always be in the same directory, you can also specify additional Hadoop jar search paths via the -hadooplib option. In case of CDH4, for instance, you’ll typically want to use the hadoop command from /usr/bin/ and the Hadoop Streaming jar from /usr/lib/hadoop-0.20-mapreduce/, which can easily be achieved by specifying -hadoop /usr and -hadooplib /usr/lib/hadoop-0.20-mapreduce.

Mapper and reducer classes

When you generate the list of the 5 most frequently occurring IPs, you might want to exclude certain IPs. Hence, it might be useful to extend the program above such that it reads a file excludes.txt consisting of IPs that need to be ignored:

$ head -n 3 excludes.txt
127.0.0.1
72.14.247.99
209.85.171.99

The following Dumbo program is such an extension of the previous program:

class Mapper:
    def __init__(self):
        file = open("excludes.txt", "r")
        self.excludes = set(line.strip() for line in file)
        file.close()
    def __call__(self, key, value):
        ip = value.split(" ")[0]
        if not ip in self.excludes:
            yield ip, 1

def reducer(key, values):
    yield key, sum(values)

if __name__ == "__main__":
    import dumbo
    dumbo.run(Mapper, reducer, combiner=reducer)

The main difference between this program and the previous one is that the mapper is a class in this case. Since an instance of this class is created just before the mapping is started, you can use the constructor for initializations like, e.g., reading a file. When running this program, you need to make sure that the file excludes.txt is put in the working directory on each cluster node. This can be done with the -file option:

$ dumbo start ipcount.py -hadoop /usr/local/hadoop -input weblogs/2008/12/* \
-output ipcounts -file excludes.txt

When the excludes file is very big, it would probably be better to use the -cacheFile option instead, but for small enough files -file is more convenient. You can find more info about -cacheFile (and several other useful options) in Running programs.
As you probably expected already, reducers and combiners can be classes as well. There is no example that illustrates this in this tutorial though, since the mechanism is completely analogous (and people tend to dislike reading lengthy tutorials).

Jobs and runners

Now, suppose that you want to find out for which IPs the daily counts are most often higher than a given number. The following Dumbo program dailycount.py can be used to generate this information:

class DailyMapper:
    def __init__(self):
        file = open("excludes.txt","r")
        self.excludes = set(line.strip() for line in file)
        file.close()
    def __call__(self, key, value):
        parts = value.split(" ")
        ip, date = parts[0], parts[3][1:].split(":")[0]
        if not ip in self.excludes:
            yield (ip, date), 1

class FilterMapper:
    def __init__(self):
        import os
        self.mincount = int(self.params["mincount"])
    def __call__(self, key, value):
        ip, date = key
        if value >= self.mincount:
            yield ip, 1

def reducer(key, values):
    yield key, sum(values)

if __name__ == "__main__":
    import dumbo
    job = dumbo.Job()
    job.additer(DailyMapper, reducer, combiner=reducer)
    job.additer(FilterMapper, reducer, combiner=reducer)
    job.run()

Running this program can be done as follows:

$ dumbo start dailycount.py -hadoop /usr/local/hadoop -input weblogs/2008/12/* \
-output ipcounts -file excludes.txt -param mincount=100

This example program illustrates how parameters can be be passed to Dumbo programs by means of the -param option, but even more interesting is that it consists of two MapReduce iterations. As shown by this program, a Job object can be used to register multiple iterations.
Note that you can also write

def runner(job):
    job.additer(DailyMapper, reducer, combiner=reducer)
    job.additer(FilterMapper, reducer, combiner=reducer)

if __name__ == "__main__":
    import dumbo
    dumbo.main(runner)

instead of

if __name__ == "__main__":
    import dumbo
    job = dumbo.Job()
    job.additer(DailyMapper, reducer, combiner=reducer)
    job.additer(FilterMapper, reducer, combiner=reducer)
    job.run()

On its own this might not make much of a difference, but using a runner does have an important advantage, namely, that you can also use a starter then.

Programs and starters

Starters are similar to runners, but instead of running a job they start a program. Less abstractly, a starter can simplify the start commands quite a lot. For instance, by adding the code

def starter(program):
    year = program.delopt("year")
    if not year:
        # an alternative (and probably better) way to
        # bail out is to raise dumbo.Error(msg)
        return "'year' not specified"

    month = program.delopt("month")
    if not month: return "'month' not specified"
    if len(month) == 1:
        month = "0" + month

    mincount = program.delopt("mincount")
    if not mincount:
        mincount = "100"

    program.addopt("input", "weblogs/%s/%s/*" % (year, month))
    program.addopt("param", "mincount=" + mincount)
    program.addopt("file", "excludes.txt")

if __name__ == "__main__":
    import dumbo
    dumbo.main(runner, starter)

to the example above, it can be started as follows:

$ dumbo start dailycount.py -hadoop /usr/local/hadoop -year 2008 -month 12 -output ipcounts

It can then also still be started without executing the starter by adding the option -starter no:

$ dumbo start dailycount.py -hadoop /usr/local/hadoop -input weblogs/2008/12/* \
-output ipcounts -file excludes.txt -param mincount=100 -starter no

Input formats

So far we have always taken text files as input, but Dumbo can deal with other file formats too. The -inputformat option allows you to specify the format of the input files. Possible values are:

text: text files
sequencefile: sequence files
auto: decide between text and sequencefile based on the file header (this is the default value)
<name_of_java_class>: use a custom InputFormat (this is rarely needed)

Since auto is the default value, it usually is not necessary to use the -inputformat option, but it is more safe to do so anyway because auto could be mislead when the first bytes of a text file happen to correspond to the sequence file header (which is very unlikely but possible nevertheless).
In case of text files, each input value is a string that corresponds to one line of the file and the keys are the offsets of the lines in the files (as Python integers). For sequence files, however, the type of the keys and values can differ from file to file. Most common writables are converted to suitable Python types, and the remaining writables are converted to a string by means of their toString() method. Hadoop records are converted to lists consisting of the values of their attributes.
Another possible value for the -inputformat option is code. This value was added for dealing with the files outputted by Dumbo, but since the output files on Hadoop really are sequence files containing Typed bytes objects, sequencefile and hence also auto work for these files as well. For local runs, however, it is necessary to use the option -inputformat code when you want to take output files as input, since Dumbo programs that run locally cannot deal with sequence files (and therefore their output is stored in special text files instead of sequence files). To print a file from HDFS as a “code” file for local processing, you can use dumbo cat with the option -ascode yes.

Eggs and jars

Python modules contained in eggs can be used in Dumbo programs by adding -libegg <path_to_egg> options. Similarly, jars can used by adding -libjar <path_to_jar> options. A common use case for this is when the input consists of sequence files that contain custom Hadoop records. In order to be able to read such input, the classes for the records need to be put on the classpath, and adding a -libjar for the jar(s) that contain(s) these classes is a possible way of doing this.

Let me start with some simple examples that will run on one machine and scale to meet larger demands. You can try them on your laptop and then transition to a larger cluster—like one you've built with commodity Linux machines, your company or university's Hadoop cluster or Amazon Elastic MapReduce.

Parallel Problems

Let's start with problems that can be divided into smaller independent units of work. These problems are roughly classified as "embarrassingly parallel" and are—as the term suggests—suitable for parallel processing. Examples:

Classify e-mail messages as spam.
Transcode video.
Render an Earth's worth of map tile images.
Count logged lines matching a pattern.
Figure out errors per day of week for a particular application.

Now the hard work begins. Parallel computing is complex. Race conditions, partial failure and synchronization impede our progress. Here's where MapReduce saves our proverbial bacon.

MapReduce by Example

MapReduce is a coding pattern that abstracts much of the tricky bits of scalable computations. We're free to focus on the problem at hand, but it takes practice. So let's practice!
Say you have 100 10GB log files from some custom application—roughly a petabyte of data. You do a quick test and estimate it will take your desktop days do grep every line (assuming you even could fit the data on your desktop). And, that's before you add in logic to group by host and calculate totals. Your tried-and-true shell utilities won't help, but MapReduce can handle this without breaking a sweat.
First let's look at the raw data. Log lines from the custom application look like this:


localhost: restarting
dsl5.example.com: invalid user 'bart'
dsl5.example.com: invalid user 'charlie'
dsl5.example.com: invalid user 'david'
dsl8.example.net: invalid password for user 'admin'
dsl8.example.net: user 'admin' logged in

The log format is hostname, colon, message. Your boss suspects someone evil is trying to brute-force attack the application. The same host trying many different user names may indicate an attack. He wants totals of "invalid user" messages grouped by hostname. Filtering the above log lines should yield:


dsl5.example.com        3

With gigabytes of log files, your trusty shell tools do just fine. For a terabyte, more power is needed. This is a job for Hadoop and MapReduce.
Before getting to Hadoop, let's summon some Python and test locally on a small dataset. I'm assuming you have a recent Python installed. I tested with Python 2.7.3 on Ubuntu 12.10.
The first program to write consumes log lines from our custom application. Let's call it map.py:


#!/usr/bin/python
import sys
for line in sys.stdin:
  if 'invalid user' in line:
    host = line.split(':')[0]
    print '%s\t%s' % (host, 1)

map.py prints the hostname, a tab character and the number 1 any time it sees a line containing the string "invalid user". Write the example log lines to log.txt, then test map.py:


chmod 755 map.py
./map.py < log.txt

The output is:


dsl5.example.com        1
dsl5.example.com        1
dsl5.example.com        1

Output of map.py will be piped into our next program, reduce.py:


#!/usr/bin/python
import sys
last_host = None
last_count = 0
host = None
for line in sys.stdin:
  host, count = line.split('\t')
  count = int(count)
  if last_host == host:
    last_count += count
  else:
    if last_host:
      print '%s\t%s' % (last_host, last_count)
    last_host = host
    last_count = count
if last_host == host:
  print '%s\t%s' % (last_host, last_count)

reduce.py totals up consecutive lines of a particular host. Let's assume lines are grouped by hostname. If we see the same hostname, we increment a total. If we encounter a different hostname, we print the total so far and reset the total and hostname. When we exhaust standard input, we print the total if necessary. This assumes lines with the same hostname always appear consecutively. They will, and I'll address why later. Test by piping it together with map.py like so:


chmod 755 reduce.py
./map.py < log.txt | sort | ./reduce.py

Later, I'll explain why I added sort to the pipeline. This prints:


dsl5.example.com        3

Exactly what we want. A successful test! Our test log lines contain three "invalid user" messages for the host dsl5.example.com. Later we'll get this local test running on a Hadoop cluster.
Let's dive a little deeper. What exactly does map.py do? It transforms unstructured log data into tab-separated key-value pairs. It emits a hostname for a key, a tab and the number 1 for a value (again, only for lines with "invalid user" messages). Note that any number of log lines could be fed to any number of instances of the map.py program—each line can be examined independently. Similarly, each output line of map.py can be examined independently.
Output from map.py becomes input for reduce.py. The output of reduce.py (hostname, tab, number) looks very similar to its input. This is by design. Key-value pairs may be reduced multiple times, so reduce.py must handle this gracefully. If we were to re-reduce our final answer, we would get the exact same result. This repeatable, predictable behavior of reduce.py is known as idempotence.
We just tested with one instance of reduce.py, but you could imagine many instances of reduce.py handling many lines of output from map.py. Note that this works only if lines with the same hostname appear consecutively. In our test, we enforce this constraint by adding sort to the pipeline. This simulates how our code behaves within Hadoop MapReduce. Hadoop will group and sort input to reduce.py similarly.
We don't have to bother with how execution will proceed and how many instances of map.py and reduce.py will run. We just follow the MapReduce pattern and Hadoop does the rest.

MapReduce with Hadoop

Hadoop is mostly a Java framework, but the magically awesome Streaming utility allows us to use programs written in other languages. The program must only obey certain conventions for standard input and output (which we've already done).
You'll need Java 1.6.x or later (I used OpenJDK 7). The rest can and should be performed as a nonroot user.
Download the latest stable Hadoop tarball (see Resources). Don't use a distro-specific (.rpm or .deb) package. I'm assuming you downloaded hadoop-1.0.4.tar.gz. Unpack this and change into the hadoop-1.0.4 directory. The directory hadoop-1.0.4 and the files map.py, reduce.py and log.txt should be in /tmp. If not, adjust the paths in the examples below as necessary.
Run the job on Hadoop like so:


cd /tmp/hadoop-1.0.4
bin/hadoop jar \
  contrib/streaming/hadoop-streaming-1.0.4.jar \
  -mapper /tmp/map.py -reducer /tmp/reduce.py \
  -input /tmp/log.txt -output /tmp/output

Hadoop will log some stuff to the console. Look for the following:


...
INFO streaming.StreamJob:  map 0%  reduce 0%
INFO streaming.StreamJob:  map 100%  reduce 0%
INFO streaming.StreamJob:  map 100%  reduce 100%
INFO streaming.StreamJob: Output: /tmp/output

This means our job completed successfully. I see a file called /tmp/output/part-00000, which contains just what we expect:


dsl5.example.com        3

Now is a good time to pause, smile and reward yourself with a quad-shot grande iced caramel macchiato. You're a rockstar.

Figure 1. Here's what we did during the map and reduce steps. The transformations we performed allow us to run many mappers and reducers on as many machines as we want. Hadoop takes care of the gory details. It starts mappers and reducers, passes data between them and spits out the answer.

Clustered MapReduce

If you've got everything working so far, try starting your own cluster too! Running Hadoop on a single physical machine with multiple Java virtual machines is called pseudo-distributed operation.
Pseudo-distributed operation requires some configuration. The user you're running Hadoop as must also be able to make SSH passwordless connections to localhost. Installing and configuring this is beyond the scope of this article, but you'll find more information in the "Single Node Setup" tutorial mentioned in Resources. If you started with the 1.0.4 tarball release recommended above, the tutorial should work verbatim on any standard GNU/Linux distribution.
If you set up pseudo-distributed (or distributed) Hadoop, you'll gain the benefit of two spartan-but-useful Web interfaces. The NameNode Web interface allows you to browse logs and browse the Hadoop distributed filesystem. The JobTracker Web interface allows you to monitor MapReduce jobs and debug problems.

Figure 2. NameNode Web Interface

Figure 3. JobTracker Web Interface

Beautifully Simple Python MapReduce

You may wonder why reduce.py (above) is a convoluted mini-state machine. This is because hostnames change in the input lines provided by Hadoop. The Dumbo Python library (see Resources) hides this detail of Hadoop. Dumbo lets us focus even more tightly on our mapping and reducing.
In Dumbo, our MapReduce implementation becomes:


def mapper(key, value):
  if 'invalid user' in value:
    yield value.split(':')[0], 1

def reducer(key, values):
  yield key, sum(values)

if __name__ == '__main__':
  import dumbo
  dumbo.run(mapper, reducer)

The state machine is gone. Dumbo takes care of grouping by key (hostname).
Save the above code in a file called /tmp/smart.py. Install Dumbo. See Resources for a link, and don't worry, it's easy. Once Dumbo is installed, run the code:


cd /tmp
dumbo start smart.py -hadoop hadoop-1.0.4 \
  -input log.txt -output totals \
  -outputformat text

Finally, examine the output:


cat totals/part-00000

The content should match our earlier result from Hadoop Streaming.

Non-Use Cases

Hadoop is great for one-time jobs and off-line batch processing, especially where the data is already in the Hadoop filesystem and will be read many times. My first example makes more sense if you assume this. Perhaps the job must be run daily and must finish within a few minutes.
Consider some cases when Hadoop is the wrong tool. Small dataset? Don't bother. In a one-meter race between a rocket and a scooter, the scooter is gone before the rocket's engines are started. Transactional data storage for a Web site? Try MySQL or MongoDB instead.
Hadoop also won't help you process data as it arrives. This is often referred to as "real time" or "streaming". For that, consider Storm (see Resources for more information).
With practice, you'll quickly be able to discern when Hadoop is the right tool for the job.

Resources

You can download the latest stable Hadoop tarball from .
See http://hadoop.apache.org/docs/current/single_node_setup.html for information on how to run a pseudo-distributed Hadoop cluster.
Check out Dumbo at http://projects.dumbotics.com/dumbo if you want to do more with MapReduce in Python. See https://github.com/klbostee/dumbo/wiki/Building-and-installing for install instructions and https://github.com/klbostee/dumbo/wiki/Short-tutorial for an excellent tutorial.
See https://github.com/nathanmarz/storm for information on Storm, a real-time distributed computing system.
To run Storm and Hadoop and manage both centrally, check out the Mesos project at http://www.mesosproject.org.

How to create a JavaEE 6 web application in Eclipse Indigo 3.7.1 using JSF 2.0, EJB 3.1, JPA 2.0, MySQL Server 5.1, and EclipseLink 2.3

This tutorial demonstrates all the steps to create and design a dynamic web application using latest version of eclipse, i.e. Eclipse Indigo 3.7.1. There are lot of articles, which teach Java EE 6 Applications with Netbeans, but there is no such article, which demostrates all the steps together using Eclipse, so, I decided to write this tutorial to describe all the steps necessary to create a Java EE 6 web application using Eclipse. I have also sorted out the best books on java EE 6. You can buy these books if you want to see a complete reference. This tutorial uses following technologies:

Java EE 6 as a platform
GlassFish 3.1.1 Server as a deploying server for our web application
JSF 2.0 for our web module
JPA 2.0 to manage our persistence classes
EJB 3.1 to write our enterprise java beans for business logic
EclipseLink 2.3 for ORM (Object Relational Mapping) to map our relations(tables) to entity classes
MySQL Server 5.1 as DBMS (Database Management Server)

The turorial is divided into the following steps. If you are a beginner just follow all the steps from the beginning. This tutorial will guide you from the very basic steps till the end. If you are familiar with basic steps just skip and jump to your level of step. So, lets go to the first step.

This article is free. If you like this article, and want more and more free quality articles on this site, then donate us. Thank you!

Saturday, June 8, 2013

Database Conections problem in Glassfish3

A problem encountered for NGS-web project.

Problem:
When I deleted glassfish3 and reinstalled it in a (slightly) different directory, the database connection of NGS-MySql (the icon you see in the bottom Eclipse panel: Data Source Explorer) could no longer be connected to MySQL database (complaining JDBC driver class not found). MySQL installation was intact, though.

Cause:
This is caused by the change in the JDBC driver location, when you reinstalled Glassfish3 to a different directory, even through you placed the mysql-connector-5.1.25-bin.jar to the $domain1/lib directory.

Solution:
In the bottom panel of Eclipse:
-> under the tab of 'Data Source explorer'
-> right click on your data source 'NGS-web'
-> prperties
-> on the left side list, choose 'Driver Properties'
-> on the far right side: there are two small icons, click the blue TRIANGLE icon: 'edit drive definition'
-> choose the tab 'JAR List'
-> Add JAR/Zip and locate your file of 'mysql-connector-5.1.25-bin.jar' !!
-> done.

Last but not list, you still want to make sure this JDBC drive is located in the $domain1/lib directory and present in the Java Classpath list:
Right click your project -> Properties -> search 'Java Build Path' and you will see it.

21.3.14. Using Connector/J with GlassFish

21.3.14.1. A Simple JSP Application with Glassfish, Connector/J and MySQL

21.3.14.2. A Simple Servlet with Glassfish, Connector/J and MySQL

This section explains how to use MySQL Connector/J with Glassfish ™ Server Open Source Edition 3.0.1. Glassfish can be downloaded from the Glassfish website.
Once Glassfish is installed you will need to make sure it can access MySQL Connector/J. To do this copy the MySQL Connector/J JAR file to the directory GLASSFISH_INSTALL/glassfish/lib. For example, copy mysql-connector-java-5.1.12-bin.jar to C:\glassfishv3\glassfish\lib. Restart the Glassfish Application Server.
You are now ready to create JDBC Connection Pools and JDBC Resources.
Creating a Connection Pool

In the Glassfish Administration Console, using the navigation tree navigate to Resources, JDBC, Connection Pools.
In the JDBC Connection Pools frame click New. You will enter a two step wizard.
In the Name field under General Settings enter the name for the connection pool, for example enter MySQLConnPool.
In the Resource Type field, select javax.sql.DataSource from the drop-down listbox.
In the Database Vendor field, select MySQL from the drop-down listbox. Click Next to go to the next page of the wizard.
You can accept the default settings for General Settings, Pool Settings and Transactions for this example. Scroll down to Additional Properties.
In Additional Properties you will need to ensure the following properties are set:
- ServerName - The server to connect to. For local testing this will be localhost.
- User - The user name with which to connect to MySQL.
- Password - The corresponding password for the user.
- DatabaseName - The database to connect to, for example the sample MySQL database World.
Click Finish to exit the wizard. You will be taken to the JDBC Connection Pools page where all current connection pools, including the one you just created, will be displayed.
In the JDBC Connection Pools frame click on the connection pool you just created. Here you can review and edit information about the connection pool.
To test your connection pool click the Ping button at the top of the frame. A message will be displayed confirming correct operation or otherwise. If an error message is received recheck the previous steps, and ensure that MySQL Connector/J has been correctly copied into the previously specified location.

Now that you have created a connection pool you will also need to create a JDBC Resource (data source) for use by your application.
Creating a JDBC Resource
Your Java application will usually reference a data source object to establish a connection with the database. This needs to be created first using the following procedure.

Using the navigation tree in the Glassfish Administration Console, navigate to Resources, JDBC, JDBC Resources. A list of resources will be displayed in the JDBC Resources frame.
Click New. The New JDBC Resource frame will be displayed.
In the JNDI Name field, enter the JNDI name that will be used to access this resource, for example enter jdbc/MySQLDataSource.
In the Pool Name field, select a connection pool you want this resource to use from the drop-down listbox.
Optionally, you can enter a description into the Description field.
Additional properties can be added if required.
Click OK to create the new JDBC resource. The JDBC Resources frame will list all available JDBC Resources.

Glassfish JDBC resource pool error (classname missing)

View Page | Edit Page | Details

Class name is wrong or classpath is not set for : com.mysql.jdbc.jdbc2.optional.MysqlDataSource

If you get this error when trying to ping a newly created JDBC connection pool from within GlassFish admin console, it's likely because you're missing the JDBC driver mysql-connector-java-5.1.*-bin.jar under the lib/ext folder on your domain.

To fix it copy that file there (e.g. ~/glassfish-v2ur2/domains/domain1/lib/ext/mysql-connector-java-5.1.5-bin.jar ), restart the domain and try again.

zhengyuan's note:

(1) place mysql-connector-java-5.1.*-bin.jar under %glassfish_home_dir%/domains/domain1/lib

(some people said %%/lib/ext, which appears to be wrong!!)

You don’t need to do anything in Descriptor files!

(2) The Glassfish plug-in for Eclipse is located in:

/home/zhengyuan/.eclipse/org.eclipse.platform_3.7.0_1075413907/plugins/oracle.eclipse.runtime.glassfish_3.1.0.0/glassfish3/glassfish

Please check Eclipse -> Window -> Preference -> Server -> Run time server -> Edit.

(3) You can either use GUI or use command line to create a JDBC pool or JDBC resource. This is commandline:-

./asadmin create-jdbc-connection-pool --datasourceclassname=com.mysql.jdbc.jdbc2.optional.MysqlDataSource --restype=javax.sql.DataSource --description "A JDBC connection pool by zhengyuan." --property User=mysql:Password=mysql:DatabaseName=ngs:Port=3306:ServerName=localhost "wzyPool"

./asadmin create-jdbc-resource --connectionpoolid "wzyPool" --description "A JDBC resource by zhengyuan" jdbc/mytest

./asadmin ping-connection-pool wzyPool

NOW, deploy your EJB module a sfollows:

(1) In Eclipse, create your EJB project with a wrapping EAR project, then create your EJB session beans with JDBC code (my example: ZooWeb project);

(2) Eport your EJB project as WAR archive to your local file system (xx.war);

(3) Start your Glassfish on your new server (with mysql-connector-xxx.jar in its %%/lib folder);

(4) From its admin console (localhost:4848), create the connection pool and the corresponding data source as shown earlier;

(5) Ready to deploy: %%/asadmin deploy xx.war

Success! Now you can test your program via your browser. Enjoy!

(2) Another error and solution:

If you installed Jenkins and automatically started Jenkins service on boot, then Glassfish (or whatever web server will fail to start, with an error message “CREDENTIAL ERROR”). All web servers, whether installed under Linux OS or plugged into Eclipse, will fail the same way.

Solution: Stop Jenkins before you run the web server!

Redhat/CentOS: #chkconfig Jenkins off

Debian/Ubuntu: #sysv-rc-conf Jenkins off

(3) Another error and solution:
If you get an error "Communication link failure", "Connection refused", and
command "./asadmin ping-connection-pool failed":
Possible problem: Mysql setup issue.
Please check mysql setup /etc/mysql/my.cnf and make sure the bind_address = 127.0.0.1 (NOT outside IP like 192.168.x.x)

Wang Zheng Yuan

Thursday, June 27, 2013

Introduction to WCF

Introduction to WCF

Advantage

Disadvantage

Development Tools

Microsoft Visual Studio 2008

Multi-targeting

Default template is available for WCF

WCF - Test Client tools for testing the WCF service.

WCF services can be debugged now in Visual Studio 2008. Wcfsvchost.exe will do it for you because service will be self hosted when you start debugging.

Thursday, June 13, 2013

JEE EJB architecture for NGS web applications

Tuesday, June 11, 2013

Ten Questions Before Grad School

Ten Questions Before Grad School

Sunday, June 9, 2013

Short tutorial on Dumbo with MapReduce

Prerequisites

Simple example: Counting IPs

Mapper and reducer classes

Jobs and runners

Programs and starters

Input formats

Eggs and jars

Further reading

Introduction to MapReduce with Hadoop on Linux

Parallel Problems

MapReduce by Example

MapReduce with Hadoop

Clustered MapReduce

Beautifully Simple Python MapReduce

Non-Use Cases

Resources

How to create a JavaEE 6 web application in Eclipse Indigo 3.7.1 using JSF 2.0, EJB 3.1, JPA 2.0, MySQL Server 5.1, and EclipseLink 2.3

Saturday, June 8, 2013

Database Conections problem in Glassfish3

21.3.14. Using Connector/J with GlassFish

Glassfish JDBC resource pool error (classname missing)