Saturday, June 30, 2012

Some Linux skills required by jobs

Experience installing Linux servers using PXE (pre-boot execution environment) (e.​g.​Cobbler/​Kickstart).​ 
experience in working with SDLC processes
ITIL Standards
SAN:storage area network 
SAN/NAS/Storage solutions experience with NetApp or NFS

Nginx, Ganglia, Zenoss
Exceed
X server  
Experience working with F5 web load balancing systems
Iperf, MTR
Experience in network administration is a must – IpSec, LAN, VLANs, iptables, Subnets and Firewalls setup, ACL, Switches, Routing, DNS, load balancers  

writing plugins for Munin and Nagios.
familiar with basic Python/WSGI application stacks on Linux 

http://nginx.org/en/

http://en.wikipedia.org/wiki/Jenkins_%28software%29 


PREFERRED EXPERIENCE:
  • Experience in a large, highly-available Linux environment
  • Experience with MCollective
  • Open Source contributions (github account)
  • Experience writing custom Puppet types and functions
  • Experience with Ganeti
  • Experience with Git
 puppet, CF Engine, Chef, and Bcnfg 2: these are all systems automation tools.

Thursday, June 28, 2012

Introduction to Data Normalization: A Database "Best" Practice

Data normalization is a process in which data attributes within a data model are organized to increase the cohesion of entity types.  In other words, the goal of data normalization is to reduce and even eliminate data redundancy, an important consideration for application developers because it is incredibly difficult to stores objects in a relational database that maintains the same information in several places.  Table 1 summarizes the three most common forms of normalization ( First normal form (1NF), Second normal form (2NF), and Third normal form (3NF)) describing how to put entity types into a series of increasing levels of normalization. Higher levels of data normalization are beyond the scope of this article.  With respect to terminology, a data schema is considered to be at the level of normalization of its least normalized entity type.  For example, if all of your entity types are at second normal form (2NF) or higher then we say that your data schema is at 2NF.  

Level
Rule
An entity type is in 1NF when it contains no repeating groups of data.
An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key. 
An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.

1. First Normal Form (1NF)

Let’s consider an example. n entity type is in first normal form (1NF) when it contains no repeating groups of data.  For example, in Figure 1 you see that there are several repeating attributes in the data Order0NF table – the ordered item information repeats nine times and the contact information is repeated twice, once for shipping information and once for billing information. Although this initial version of orders could work, what happens when an order has more than nine order items?  Do you create additional order records for them?  What about the vast majority of orders that only have one or two items?  Do we really want to waste all that storage space in the database for the empty fields?  Likely not.  Furthermore, do you want to write the code required to process the nine copies of item information, even if it is only to marshal it back and forth between the appropriate number of objects.  Once again, likely not. 

Figure 1. An Initial Data Schema for Order (UML Notation).
Order in 0NF
Figure 2 presents a reworked data schema where the order schema is put in first normal form.  The introduction of the OrderItem1NF table enables us to have as many, or as few, order items associated with an order, increasing the flexibility of our schema while reducing storage requirements for small orders (the majority of our business).  The ContactInformation1NF table offers a similar benefit, when an order is shipped and billed to the same person (once again the majority of cases) we could use the same contact information record in the database to reduce data redundancy.  OrderPayment1NF was introduced to enable customers to make several payments against an order – Order0NF could accept up to two payments, the type being something like “MC” and the description “MasterCard Payment”, although with the new approach far more than two payments could be supported.  Multiple payments are accepted only when the total of an order is large enough that a customer must pay via more than one approach, perhaps paying some by check and some by credit card.

Figure 2. An Order Data Schema in 1NF (UML Notation).
Order in 1NF
An important thing to notice is the application of primary and foreign keys in the new solution.  Order1NF has kept OrderID, the original key of Order0NF, as its primary key.   To maintain the relationship back to Order1NF, the OrderItem1NF table includes the OrderID column within its schema, which is why it has the stereotype of FK.  When a new table is introduced into a schema, in this case OrderItem1NF, as the result of first normalization efforts it is common to use the primary key of the original table (Order0NF) as part of the primary key of the new table. Because OrderID is not unique for order items, you can have several order items on an order, the column ItemSequence was added to form a composite primary key for the OrderItem1NF table.  A different approach to keys was taken with the ContactInformation1NF table.  The column ContactID, a surrogate key that has no business meaning, was made the primary key. 

2. Second Normal Form (2NF)

Although the solution presented in Figure 2 is improved over that of Figure 1, it can be normalized further.  Figure 3 presents the data schema of Figure 2 in second normal form (2NF). an entity type is in second normal form (2NF) when it is in 1NF and when every non-key attribute, any attribute that is not part of the primary key, is fully dependent on the primary key.  This was definitely not the case with the OrderItem1NF table, therefore we need to introduce the new table Item2NF.  The problem with OrderItem1NF is that item information, such as the name and price of an item, do not depend upon an order for that item.  For example, if Hal Jordan orders three widgets and Oliver Queen orders five widgets, the facts that the item is called a “widget” and that the unit price is $19.95 is constant.  This information depends on the concept of an item, not the concept of an order for an item, and therefore should not be stored in the order items table – therefore the Item2NF table was introduced.  OrderItem2NF retained the TotalPriceExtended column, a calculated value that is the number of items ordered multiplied by the price of the item.  The value of the SubtotalBeforeTax column within the Order2NF table is the total of the values of the total price extended for each of its order items.
Order in 2NF

3. Third Normal Form (3NF)

An entity type is in third normal form (3NF) when it is in 2NF and when all of its attributes are directly dependent on the primary key. A better way to word this rule might be that the attributes of an entity type must depend on all portions of the primary key.  In this case there is a problem with the OrderPayment2NF table, the payment type description (such as “Mastercard” or “Check”) depends only on the payment type, not on the combination of the order id and the payment type.  To resolve this problem the PaymentType3NF table was introduced in Figure 4, containing a description of the payment type as well as a unique identifier for each payment type.

Order in 3NF

4. Beyond 3NF

The data schema of Figure 4 can still be improved upon, at least from the point of view of data redundancy, by removing attributes that can be calculated/derived from other ones.  In this case we could remove the SubtotalBeforeTax column within the Order3NF table and the TotalPriceExtended column of OrderItem3NF, as you see in Figure 5.

Figure 5. An Order Without Calculated Values (UML Notation).
Order Fully Normalized

5. Why Data Normalization?

The advantage of having a highly normalized data schema is that information is stored in one place and one place only, reducing the possibility of inconsistent data.  Furthermore, highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at least from a data point of view).  This generally makes it easier to map your objects to your data schema. 

6. Denormalization

From a purist point of view you want to normalize your data structures as much as possible, but from a practical point of view you will find that you need to 'back out" of some of your normalizations for performance reasons.  This is called "denormalization".  For example, with the data schema of Figure 1 all the data for a single order is stored in one row (assuming orders of up to nine order items), making it very easy to access.  With the data schema of Figure 1 you could quickly determine the total amount of an order by reading the single row from the Order0NF table.  To do so with the data schema of Figure 5 you would need to read data from a row in the Order table, data from all the rows from the OrderItem table for that order and data from the corresponding rows in the Item table for each order item.  For this query, the data schema of Figure 1 very likely provides better performance.

7. Acknowledgements

I'd like to thank Jon Heggland for his thoughtful review and feedback.  He found several bugs which had gotten by both myself and my tech reviewers. 

8. References and Suggested Online Readings

Wednesday, June 27, 2012

Introduction to Object-Orientation and the UML

The prevalence of programming languages such as Java, C++, Object Pascal, C#, and Visual Basic make it incredibly clear that object-oriented technology has become the approach of choice for new development projects.  Although procedural languages such as COBOL and PL/1 will likely be with us for decades it is clear that most organizations now consider these environments as legacy technologies that must be maintained and ideally retired at some point.  Progress marches on.  My experience is that agile software developers, be they application developers or Agile DBAs, must minimally have an understanding of object orientation if they are to be effective.  This includes understanding basic concepts such as inheritance, polymorphism, and object persistence.  Furthermore, all developers should have a basic understanding of the industry-standard Unified Modeling Language (UML).  A good starting point is to understand what I consider to be the core UML diagrams use case diagrams, sequence diagrams, and class diagrams – although as I argued in An Introduction to Agile Modeling and Agile Documentation you must be willing to learn more models over time.   



One of the advantages of working closely with other IT professionals is that you learn new skills from them, and the most effective object developers will learn and adapt fundamental concepts from other disciplines. An example is class normalization, the object-oriented version of data normalization, a collection of simple rules for reducing coupling and increasing cohesion within your object designs.
This article overviews the fundamental concepts and techniques that application developers use on a daily basis when working with object technology.  This article is aimed at Agile DBAs that want to gain a basic understanding of the object paradigm, allowing them to understand where application developers are coming from.  The primary goal of this article is to provide Agile DBAs with enough of an understanding of objects so that they have a basis from which to communicate with application developers.  Similarly, other articles overview fundamental data concepts, such as relational database technology and data modeling, that application developers need to learn so that they understand where Agile DBAs are coming from.

Table of Contents

  1. Object-Oriented Concepts
  2. The Unified Modeling Language  
  3. Class Normalization
  4. What Have You Learned?

1. Object-Oriented Concepts

Agile software developers, including Agile DBAs, need to be familiar with the basic concepts of object-orientation. The object-oriented (OO) paradigm is a development strategy based on the concept that systems should be built from a collection of reusable components called objects.  Instead of separating data and functionality as is done in the structured paradigm, objects encompass both.  While the object-oriented paradigm sounds similar to the structured paradigm, as you will see at this site it is actually quite different.  A common mistake that many experienced developers make is to assume that they have been “doing objects” all along just because they have been applying similar software-engineering principles.  To succeed you must recognize that the OO approach is different than the structured approach.
To understand OO you need to understand common object terminology.  The critical terms to understand are summarized in Table 1.  I present a much more detailed explanation of these terms in The Object Primer 3/e.  Some of these concepts you will have seen before, and some of them you haven’t.  Many OO concepts, such as encapsulation, coupling, and cohesion come from software engineering.  These concepts are important because they underpin good OO design.  The main point to be made here is that you do not want to deceive yourself – just because you have seen some of these concepts before, it don’t mean you were doing OO, it just means you were doing good design.  While good design is a big part of object-orientation, there is still a lot more to it than that.

Term
Description
Abstract class
A class that does not have objects instantiated from it
Abstraction
The identification of the essential characteristics of an item
Aggregation
Represents “is part of” or “contains” relationships between two classes or components
Aggregation hierarchy
A set of classes that are related through aggregation
Association
Objects are related (associated) to other objects
Attribute
Something that a class knows (data/information)
Class
A software abstraction of similar objects, a template from which objects are created
Cohesion
The degree of relatedness of an encapsulated unit (such as a component or a class)
Collaboration
Classes work together (collaborate) to fulfill their responsibilities
Composition
A strong form of aggregation in which the “whole” is completely responsible for its parts and each “part” object is only associated to the one “whole” object
Concrete class
A class that has objects instantiated from it
Coupling
The degree of dependence between two items
Encapsulation
The grouping of related concepts into one item, such as a class or component
Information hiding
The restriction of external access to attributes
Inheritance
Represents “is a”, “is like”, and “is kind of” relationships.  When class “B” inherits from class “A” it automatically has all of the attributes and operations that “A” implements (or inherits from other classes)
Inheritance hierarchy
A set of classes that are related through inheritance
Instance
An object is an instance of a class
Instantiate
We instantiate (create) objects from classes
Interface
The definition of a collection of one or more operation signatures that defines a cohesive set of behaviors
Message
A message is either a request for information or a request to perform an action
Messaging
In order to collaborate, classes send messages to each other
Multiple inheritance
When a class directly inherits from more than one class
Multiplicity
A UML concept combining the data modeling concepts of cardinality (how many) and optionality.
Object
A person, place, thing, event, concept, screen, or report
Object space
Main memory + all available storage space on the network, including persistent storage such as a relational database
Operation
Something a class does (similar to a function in structured programming)
Override
Sometimes you need to override (redefine) attributes and/or methods in subclasses
Pattern
A reusable solution to a common problem taking relevant forces into account
Persistence
The issue of how objects are permanently stored
Persistent object
An object that is saved to permanent storage
Polymorphism
Different objects can respond to the same message in different ways, enable objects to interact with one another without knowing their exact type
Single inheritance
When a class directly inherits from only one class
Stereotype
Denotes a common usage of a modeling element
Subclass
If class “B” inherits from class “A,” we say that “B” is a subclass of “A”
Superclass
If class “B” inherits from class “A,” we say that “A” is a superclass of “B”
Transient object
An object that is not saved to permanent storage
It is important for Agile DBAs to understand the terms presented above because the application developers that you work with will use these terms, and many others, on a regular basis.  To communicate effectively with application developers you must understand their vocabulary, and they must understand yours.  Another important aspect of learning the basics of object orientation is to understand each of the diagrams of the Unified Modeling Language (UML) – you don’t need to become a UML expert, but you do need to learn the basics.

2. An Overview of The Unified Modeling Language

The goal of this section is to provide you with a basic overview of the UML, it is not to teach you the details of each individual technique.  Much of the descriptiv material in this section is modified from The Elements of UML Style, a pocket-sized book that describes proven guidelines for developing high-quality and readable UML diagrams, and the examples from The Object Primer 3/e.  A good starting point for learning the UML is UML Distilled as it is well written and concise.  If you want a more thorough look at the UML, as well as other important models that the UML does not include, then you’ll find The Object Primer 3/e to be a better option.
It is also important to understand that you don’t need to learn all of the UML notation available to you, and believe me there’s a lot, but only the notation that you’ll use in practice.  The examples presented in this section, there is one for each UML diagram, use the core UML.  As you learn each diagram focus on learning the core notation first, you can learn the rest of the notation over time as you need to.

2.1 Core UML Diagrams

Let’s begin with what I consider to be the three core UML diagrams for developing business software: UML use case diagrams, UML sequence diagrams, and UML class diagrams.  These are the diagrams that you will see used the most in practice – use case diagrams to overview usage requirements, sequence diagrams to analyze the use cases and map to your classes, and class diagrams to explore the structure of your object-oriented software (what I like to refer to as your object schema).  These three diagrams will cover 80% of your object modeling needs when building a business application using object technology.

2.1.1 UML Use Case Diagrams

According to the UML specification a use case diagram is “a diagram that shows the relationships among actors and use cases within a system.”  Use case diagrams are often used to:
Figure 1 depicts a simple use case diagram.  This diagram depicts several use cases, actors, their associations, and optional system boundary boxes.  A use case describes a sequence of actions that provide a measurable value to an actor and is drawn as a horizontal ellipse.  An actor is a person, organization, or external system that plays a role in one or more interactions with your system.  Actors are drawn as stick figures.  Associations between actors and classes are indicated in use-case diagrams, a relationship exists whenever an actor is involved with an interaction described by a use case. Associations between actors and use cases are modeled as lines connecting them to one another, with an optional arrowhead on one end of the line indicating the direction of the initial invocation of the relationship. 


Associations also exist between use cases in system use case models and are depicted using dashed lines with the UML stereotypes of <<extend>> or <<include>>, as you see in Figure 2.  It is also possible to model inheritance between use cases, something that is not shown in the diagram.   The rectangle around the use cases is called the system boundary box and as the name suggests it delimits the scope of your system – the use cases inside the rectangle represent the functionality that you intend to implement.  
Figure 2. Associations between use cases.

Figure 3 depicts an example of a use case diagram for a university information system.  This is the level of detail that you would commonly see with use case diagrams in practice.  A good reference is UML use case diagram style guidelines.

Figure 3. A use case diagram for university application.

2.1.2 UML Sequence Diagrams

UML sequence diagrams are a dynamic modeling technique, as are collaboration diagrams and activity diagrams described below.  UML sequence diagrams are typically used to:
  • Validate and flesh out the logic of a usage scenario.  A usage scenario is exactly what its name indicates – the description of a potential way that your system is used.  The logic of a usage scenario may be part of a use case, perhaps an alternate course; one entire pass through a use case, such as the logic described by the basic course of action or a portion of the basic course of action plus one or more alternate scenarios; or a pass through the logic contained in several use cases, for example a student enrolls in the university then immediately enrolls in three seminars. 
  • Explore your design because they provide a way for you to visually step through invocation of the operations defined by your classes.
  • To detect bottlenecks within an object-oriented design.  By looking at what messages are being sent to an object, and by looking at roughly how long it takes to run the invoked method, you quickly get an understanding of where you need to change your design to distribute the load within your system.  In fact some CASE tools even enable you to simulate this aspect of your software. 
  • Give you a feel for which classes in your application are going to be complex, which in turn is an indication that you may need to draw state chart diagrams for those classes.
For example Figure 4 models a portion of the basic course of action for the "Enroll in Seminar" use case. The boxes across the top of the diagram represent classifiers or their instances, typically use cases, objects, classes, or actors.  Because you can send messages to both objects and classes, objects respond to messages through the invocation of an operation and classes do so through the invocation of static operations, it makes sense to include both on sequence diagrams.  Because actors initiate and take an active part in usage scenarios they are also included in sequence diagrams.  Objects have labels in the standard UML format “name: ClassName” where “name” is optional (objects that have not been given a name on the diagram are called anonymous objects).  Classes have labels in the format "ClassName," and actors have names in the format "Actor Name" – both common naming conventions.


I have a tendency to hand draw sequence diagrams on whiteboards.  Two such examples are show in Figure 5 and Figure 6.  Figure 5 depicts a UML sequence diagram for the Enroll in University use case, taking a system-level approach where the interactions between the actors and the system are show.  Figure 6 depicts a sequence diagram for the detailed logic of a service to determine if an applicant is already a student at the university.



UML sequence diagramming is described in detail here, and a good style reference is UML sequence diagram style guidelines.

2.1.3 UML Class Diagrams

UML class diagrams show the classes of the system, their inter-relationships, and the operations and attributes of the classes.  Class diagrams are typically used, although not all at once, to:
  • Explore domain concepts in the form of a domain model
  • Analyze requirements in the form of a conceptual/analysis model
  • Depict the detailed design of object-oriented or object-based software
A class model is comprised of one or more class diagrams and the supporting specifications that describe model elements including classes, relationships between classes, and interfaces.  Figure 3 depicts an example of an analysis UML class diagram.  Classes are shown as boxes with three sections – the top for the name of the class, the middle for the attributes, and the bottom for the operations.  Associations between classes are depicted as lines between classes.  Associations should include multiplicity indicators at each end, for example 0..1 representing “zero or one” and 1..* representing “one or more”.  Associations may have roles indicated, for example the mentors association, a recursive relation that professor objects have with other professor objects, indicates the roles of advisor and associate.  A design class model would show greater detail.  For example it is common to see the visibility and type of attributes depicted on design class diagrams as well as full operation signatures.
 

A detailed description of class diagramming is provided here, and a good style reference at UML class diagram style guidelines.

2.2 Different Goals, Different Core Diagrams

What happens if you're not developing business application development, are their different core diagrams?  Yes.  For real-time or embedded systems the core diagrams are typically UML state machine diagrams, UML communication diagrams (or UML sequence diagrams depending on your team's preference), and UML class diagrams.  For architecture efforts the core diagrams are often UML deployment and UML component diagrams.  All of these diagrams are valuable, in the right situations.  Every agile software developer should learn how to work with these diagrams at some point in their careers, but they likely aren’t the first model types that you are likely to learn.

3. Class Normalization

In the data world there is a common process called data normalization by which you organize data in such a way as to reduce and even eliminate data redundancy, effectively increasing the cohesiveness of data entities.  Can the techniques of data normalization be applied to object schemas?  Yes, but this isn’t an ideal approach because data normalization only deals data and not behavior.  We need to consider both when normalizing our object schema.  We need to rethink our approach.  Class normalization is a process by which you reorganize the structure of your object schema in such a way as to increase the cohesion of classes while minimizing the coupling between them.
Fundamentally class normalization is a technique for improving the quality of your object schemas.  The exact same thing can be said of the application of common design pattern, such as those defined by the “Gang of Four (GoF)” in Design Patterns (Gamma et. al. 1995).  Design patterns are known solutions to common problems, examples of which include the Strategy pattern for implementing a collection of related algorithms and the Singleton pattern for implementing a class that only has one instance.  The application of common design patterns will often result in a highly normalized object schema, although the overzealous application of design patterns can result in you overbuilding your software unnecessarily.  As Agile Modeling (AM) suggests, you should follow the practice Apply Patterns Gently and ease into a design pattern over time.
Another common approach to improving object schemas is refactoring (Fowler 1999).  Refactoring is a disciplined way to restructure code by applying small changes to your code to improve its design.  Refactoring enables you to evolve your design slowly over time.  Class normalization and refactoring fit together quite well – as you’re normalizing your classes you will effectively be applying many known refactorings to your object schema.  A fundamental difference between class normalization and refactoring is that class normalization is typically performed to your models whereas refactorings are applied to your source code.

4. What Have You Learned?

This article presented a very brief overview of object-orientation (OO).  I started with a summary of common OO terms to help you to understand the fundamental vocabulary that OO developers use.  The table of definitions is a good start but that’s all it is, a good start.  If you truly want to understand these terms, and their implications, you’ll need to do some more reading.  You will also need to roll up your sleeves and work with object technology for several years to truly understand the OO paradigm, reading isn’t enough.  
The next section summarized the artifacts of the Unified Modeling Language (UML), describing each of type of UML diagram, its common usage, and provided a quick example of each one.  An important thing to understand about the UML is that if you are new to it that you should start with the core diagrams that are appropriate to your situation.  For business application development use case diagrams, sequence diagrams, and class diagrams are the core diagrams in my experience.  Furthermore, you don’t need to learn all of the notation at first, and you may never need to learn all of it, you just need to learn the enough notation to create models that are just barely good enough for your situation.  Finally, you need to recognize that this article provided a brief overview of the UML, you’ll want to read other books that present a much more detailed description if you wish to learn to apply the effectively.
The third section overviewed an object-oriented design technique called class normalization, the OO equivalent of data normalization.  Although these techniques aren’t as popular as refactoring or the application of design patterns, I believe that they are important because they provide a very good bridge between the object and data paradigms.  The rules of class normalization provide advice that effective object designers have been doing for years, so there is really nothing new in that respect.  However, they describe basic object design techniques in a manner that data professionals such as Agile DBAs can readily understand, helping to improve the communication within your project teams.
My hope is that you have discovered that there is a fair bit to OO.  I also hope that you recognize that there is some value in at least understanding the basic fundamentals of OO, and better yet you may even decide to gain more experience in it.  Object technology is real, being used for mission-critical systems, and is here to stay.  At a minimum every IT professional needs to be familiar with it.

5. References and Suggested Online Readings

Tuesday, June 26, 2012

LSF




configure file: lsf.conf
sudoers: lsf.sudoers

manage a host:
$badmin hclose hostA
$badmin hopen hostA
$badmin hshutdown hostA

manage a queue:
$badmin qclose normal #no more jobs may be submitted
$badmin qopen normal
$badmin qinact normal #jobs not started
$badmin qact normal
$badmin qhist #view a queue history

manage the daemons:
$badmin mbdrestart
$lsadmin limstartup hostA
$lsadmin limshutdown hostA
$lsadmin resstartup hostB
$lsadmin resshutdown hostB

Set cluster admins:
edit file: lsf.cluster.cluster_name
Begin ClusterAdmins
ADMINISTRATORS = lsfadmin group6 admin3 admin4
End ClusterAdmins


To make the changes effective now:
$badmin reconfig










LSF and PBS scheduling

LSF: bsub

7 required daemons: LIM, Master LIM, ELIM, RES, MBD, SBD, PIM
major configure file: lsb.params; lsb.queues
commonly used commands:
bsub; bparams; bjobs; bhosts; bqueues; busers;


other commands:
lshosts; lsid; lsclusters; lsinfo; lsload
bhist; #job history



other configure files:
lsb.cluster.cluster_name #LIM configure file

lsb.events #event log

PBS: qsub

Monday, June 25, 2012

Copy and paste text with vi or vim

Posted by Rex in UNIX

The ability to duplicate text in an editor can be handy. vi and vim have several useful copy and paste commands.


The command ‘Y’ or ‘yy’ copies (yanks) one or more lines. To copy one line, two lines, 10 lines, and all lines to the end of the file, respectively:
Y 2Y
10Y
yG

To paste the text contained in the buffer above (uppercase P) or below the current cursor position (lowercase p), respectively:
P p
It is also possible to yank text within a line. The following commands yank text from the current cursor position to the end of the word and the end of the line, respectively:
yw y$

The same commands paste the text within a line. Lower case p pastes after the cursor position and upper case P pastes before.
Paste will also work with deleted text, either lines or parts of lines. Be careful not to execute any other commands prior to pasting as this will empty the buffer.

Saturday, June 23, 2012

Linux IPTables: How to Add Firewall Rules (With Allow SSH Example)

Linux IPTables: How to Add Firewall Rules (With Allow SSH Example)

by Ramesh Natarajan on February 14, 2011
This article explains how to add iptables firewall rules using the “iptables -A” (append) command.
“-A” is for append. If it makes it easier for you to remember “-A” as add-rule (instead of append-rule), it is OK. But, keep in mind that “-A” adds the rule at the end of the chain.
Again, it is very important to remember that -A adds the rule at the end.

Typically the last rule will be to drop all packets. If you already have a rule to drop all packets, and if you try to use “-A” from the command-line to create new rule, you will end-up adding the new rule after the current “drop all packets” rule, which will make your new rule pretty much useless.
Once you’ve mastered the iptables, and when you are implementing it on production, you should use a shell script, where you use -A command to add all the rules. In that shell script, your last line should always be “drop all packets” rule. When you want to add any new rules, modify that shell script and add your new rules above the “drop all packets” rule.
Syntax:
iptables -A chain firewall-rule
  • -A chain – Specify the chain where the rule should be appended. For example, use INPUT chain for incoming packets, and OUTPUT for outgoing packets.
  • firewall-rule – Various parameters makes up the firewall rule.
If you don’t know what chain means, you better read about iptables fundamentals first.

Firewall Rule Parameters

The following parameters are available for all kinds of firewall rules.

-p is for protocol

  • Indicates the protocol for the rule.
  • Possible values are tcp, udp, icmp
  • Use “all” to allow all protocols. When you don’t specify -p, by default “all” protocols will be used. It is not a good practice to use “all”, and always specify a protocol.
  • Use either the name (for example: tcp), or the number (for example: 6 for tcp) for protocol.
  • /etc/protocols file contains all allowed protocol name and number.
  • You an also use –protocol

-s is for source

  • Indicates the source of the packet.
  • This can be ip address, or network address, or hostname
  • For example: -s 192.168.1.101 indicates a specific ip address
  • For network mask use /mask. For example: “-s 192.168.1.0/24″ represents a network mask of 255.255.255.0 for that network. This matches 192.168.1.x network.
  • When you don’t specify a source, it matches all source.
  • You can also use –src or –source

-d is for destination

  • Indicates the destination of the packet.
  • This is same as “-s” (except this represents destination host, or ip-address, or network)
  • You can also use –dst or –destination

-j is target

  • j stands for “jump to target”
  • This specifies what needs to happen to the packet that matches this firewall rule.
  • Possible values are ACCEPT, DROP, QUEUE, RETURN
  • You can also specify other user defined chain as target value.

-i is for in interface

  • i stands for “input interface”
  • You might over look this and assume that “-i” is for interface. Please note that both -i and -o are for interfaces. However, -i for input interface and -o for output interface.
  • Indicates the interface through which the incoming packets are coming through the INPUT, FORWARD, and PREROUTING chain.
  • For example: -i eth0 indicates that this rule should consider the incoming packets coming through the interface eth0.
  • If you don’t specify -i option, all available interfaces on the system will be considered for input packets.
  • You can also use –in-interface

-o is for out interface

  • o stands for “output interface”
  • Indicates the interface through which the outgoing packets are sent through the INPUT, FORWARD, and PREROUTING chain.
  • If you don’t specify -o option, all available interfaces on the system will be considered for output packets.
  • You can also use –out-interface

Additional Options for Firewall Parameters

Some of the above firewall parameters in turn has it’s own options that can be passed along with them. Following are some of the most common options.
To use these parameter options, you should specify the corresponding parameter in the firewall rule. For example, to use “–sport” option, you should’ve specified “-p tcp” (or “-p udp”) parameter in your firewall rule.
Note: All of these options have two dashes in front of them. For example, there are two hyphens in front of sport.

–sport is for source port (for -p tcp, or -p udp)

  • By default all source ports are matched.
  • You can specify either the port number or the name. For example, to use SSH port in your firewall rule, use either “–sport 22″ or “–sport ssh”.
  • /etc/services file contains all allowed port name and number.
  • Using port number in the rule is better (for performance) than using port name.
  • To match range of ports, use colon. For example, 22:100 matches port number from 22 until 100.
  • You can also use –source-port

–dport is for destination port (for -p tcp, or -p udp)

  • Everything is same as –sport, except this is for destination ports.
  • You can also use –destination-port

–tcp-flags is for TCP flags (for -p tcp)

  • This can contain multiple values separated by comma.
  • Possible values are: SYN, ACK, FIN, RST, URG, PSH. You can also use ALL or NONE

–icmp-type is for ICMP Type (for -p icmp)

  • When you use icmp protocol “-p icmp”, you can also specify the ICMP type using “–icmp-type” parameter.
  • For example: use “–icmp-type 0″ for “Echo Reply”, and “–icmp-type 8″ for “Echo”.

Example Firewall Rule to Allow Incoming SSH Connections

Now that you understand various parameters (and it’s options) of firewall rule, let us build a sample firewall rule.
In this example, let us allow only the incoming SSH connection to the server. All other connections will be blocked (including ping).
WARNING: Playing with firewall rules might render your system inaccessible. If you don’t know what you are doing, you might lock yourself (and everybody else) out of the system. So, do all your learning only on a test system that is not used by anybody, and you have access to the console to restart the iptables, if you get locked out.

1. Delete Existing Rules

If you already have some iptables rules, take a backup before delete the existing rules.
Delete all the existing rules and allow the firewall to accept everything. Use iptables flush as we discussed earlier to clean-up all your existing rules and start from scratch.
Test to make sure you are able to ssh and ping this server from outside.
When we are done with this example, you’ll only be able to SSH to this server. You’ll not be able to ping this server from outside.

2. Allow only SSH

Allow only the incoming SSH connection to this server. You can ssh to this server from anywhere.
iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT
The above iptables command has the following 4 components.
  • “-A INPUT” – This indicates that we are appending a new rule (or adding) to the INPUT chain. So, this rule is for incoming traffic.
  • “-i eth0″ – Incoming packets through the interface eth0 will be checked against this rule.
  • “-p tcp –dport 22″ – This rule is for TCP packets. This has one tcp option called “–dport 22″, which indicates that the destination port for this rule on the server is 22 (which is ssh).
  • “-j ACCEPT” – Jump to accept, which just ACCEPTS the packet.
In simple terms the above rule can be stated as: All incoming packets through eth0 for ssh will be accepted.

3. Drop all Other Packets

Once you’ve specified your custom rules to accept packets, you should also have a default rule to drop any other packets.
This should be your last rule in the INPUT chain.
To drop all incoming packets, do the following.
iptables -A INPUT -j DROP

4. View the SSH rule and Test

To view the current iptables firewall rules, use “iptables -L” command.
# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
DROP       all  --  anywhere             anywhere
As you see from the above output, it has the following two rules in sequence.
  • Accept all incoming ssh connections
  • Drop all other packets.
Instead of adding the firewall rules from the command line, it might be better to create a shell script that contains your rules as shown below.
# vi iptables.sh
iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -j DROP

# sh -x iptables.sh
+ iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT
+ iptables -A INPUT -j DROP

# iptables -L INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
DROP       all  --  anywhere             anywhere
Similar to iptables append/add command, there are few other commands available for iptables. I’ll cover them in the upcoming articles in the iptables series. I’ll also provide several practical firewall rule examples that will be helpful in real life scenarios.
Previous articles in the iptables series: