Site Logo

QCRI Arabic Language Technologies

Tools & Demos "FARASA"


How to use FARASA Packages

The usage of all Farasa packages (segmentation module, POS tagger, and the parser) is almost the same. you can use each package as a standalone application or as a library inside another software.

The following usage of Farasa considers only Linux, Mac and Windows operating systems. For Java version, we tried Java 7 and Java 8 to build and run the Farasa packages. Earlier version of java may not be suitable to build the packages due to some dependencies

(where the encoding is utf-8)

Farasa Segmenter Module

There are two options to download Farasa Segmenter; either downloading just the jar file, or downloading the entire sourcecode zipped; "FarasaSegmenter.tar.gz". In case downlaoding the jar file, you can skip the building and compiling step

In case downloading the sourcecode of Farasa Segmenter module from the link sent to your email (through the registeration), unzip the file and then change to the home directory of the project "FarasaSegmenter" and exexute the following commands into the terminal to compile the sourcecode and build the jar file:

ant clean
ant jar

To run the package as a standalone, there two ways; either in an interactive mode, just run the following command:

java -jar dist/farasaSeg.jar

Or, just pass a text file (where the encoding is utf-8) as input to the package and specify the output file name as following:

java -jar dist/farasaSeg.jar -i <inputfile> -o <output_file>

To use Farasa segmentation package as a library in your application, just build it as before using the shell script file "make.sh". Then import the jar file farasaSeg.jar into your project. The following is an example few line of code to show how to use Farasa segmentation package

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;

public class TryingSeg {

    ...

    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException {
         ...

        Farasa farasa = new Farasa();
        ArrayList<String> output = farasa.segmentLine("النص المراد معالجته");
                for(String s: output)
                System.out.println(s);
         ...
     }
     ...
}

Farasa POS Tagger

As Farasa Segmenter, there are two options to use Farasa POS Tagger; either downloading just the jar file, or downloading the entire sourcecode zipped; "FarasaPOS.tar.gz". In case downloading the jar file, you can skip the building and compiling step. You just need to create a directory "lib" in the same level where the jar file is and copy the jar file of Farasa Segmenter module to this directory. Furthermore, download the file "weka.jar" and place it in the directory as well.

In case downloading the sourcecode of Farasa POS Tagger from the link sent to your email (through the registeration), unzip the file and then change to the home directory of the project "FarasaPOS" and exexute the following commands into the terminal to compile the sourcecode and build the jar file:

ant clean
ant jar

To run the package as a standalone, just pass a text file (where the encoding is utf-8) as input to the package and specify the output file name as following:

java -jar dist/FarasaPOSJar.jar -i <inputfile> -o <output_file>

To use Farasa POS Tagger as a library in your application, just build it as before using the shell script file "make.sh". Then import the jar file FarasaPOS.jar into your project. The following is an example fews line of code to show how to use Farasa POS module

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.pos.Sentence;
import com.qcri.farasa.pos.Clitic;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;

public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {


        Farasa farasa = new Farasa();
        FarasaPOSTagger farasaPOS = new FarasaPOSTagger(farasa);

        ArrayList<String> segOutput = farasa.segmentLine("النص المراد معالجته");

        Sentence sentence = farasaPOS.tagLine(segOutput);

        for (Clitic w : sentence.clitics)
        System.out.println(w.surface + "/" + w.guessPOS + ((w.genderNumber!="")?"-"+w.genderNumber:"")+" ");
     }
}

Farasa Diacritizer

Farasa Diacritizer depends on Language Model in its internal structure for its decisions. In Farasa's implementation, we use KenLM tookit. We compiled the toolkit is already compiled and a binary is provided for you in the "data" directoy of Farasa for MAC, Linux, and for Windows too. Farasa detects the OS automatically and use the right binary. Unlike the previous Farasa packages, in Farasa Diacritizer you need to point to the directory where you place the KenLM binary and the Farasa Language Model file. In the standalone mode, you need to provide an extra parameter -farasalm <dir absolute path> when you pass the text file as input to the package and specify the output file name as following:

java -jar dist/farasaSeg.jar -i <inputfile> -o <output_file>

You can download the different binaries and Language model from this link.

However, such binaries may not works properly since you have different version of OS. Therefore, if you have any problem in this matter just compile the toolkit and place the old binary with the compiled version with the same name.

To use Farasa Diacritizer as a library in your application, just build it (or download the already built one) and then import the jar file FarasaDiacritize.jar into your project. The following is an example few lines of code to show how to use Farasa POS module.

package tryingfarasa;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.diacritize.DiacritizeText;


public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {

        Farasa farasa = new Farasa();
        FarasaPOSTagger farasaPOS = new FarasaPOSTagger(farasa);

        String dataDirectory = "/var/www/farasa/data/";
        DiacritizeText dt = new DiacritizeText(dataDirectory, "all-text.txt.nocase.blm", farasa, tagger);
        String diacritized = dt.diacritize("النص المراد معالجته");
     }
}

Farasa Constituency Parser

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.pos.Sentence;
import com.qcri.farasa.pos.Clitic;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;

public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {


        Farasa farasa = new Farasa();
        FarasaPOSTagger farasaPOS = new FarasaPOSTagger(farasa);

        ArrayList<String> segOutput = farasa.segmentLine("النص المراد معالجته");

        Sentence sentence = farasaPOS.tagLine(segOutput);

        for (Clitic w : sentence.clitics)
        System.out.println(w.surface + "/" + w.guessPOS + ((w.genderNumber!="")?"-"+w.genderNumber:"")+" ");
     }
}

Farasa Named-Entity Recognizer

In case of using just the jar file, download it and create a directory with the name "lib" next to it. Furthermore, you need to download and place the next set of jar files in the lib directory:

In the command prompt, navigate to the directory where the jar file is and type the following command:
java -jar FarasaNERJar.jar -i <inputfile> -o <output_file>

To use Farasa NER within your application, follow the next example code:

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.ner.ArabicNER;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;

public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {





        ArrayList<String> segOutput = farasa.segmentLine("النص المراد معالجته");

        Sentence sentence = farasaPOS.tagLine(segOutput);

        for (Clitic w : sentence.clitics)
        System.out.println(w.surface + "/" + w.guessPOS + ((w.genderNumber!="")?"-"+w.genderNumber:"")+" ");

public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException, UnsupportedEncodingException, InterruptedException, Exception {
        Farasa segmenter = new Farasa();
        FarasaPOSTagger tagger = new FarasaPOSTagger(segmenter);
        ArabicNER ner = new ArabicNER(segmenter, tagger);



        ArrayList output = ner.tagLine("النص المراد معالجته");

	int loc = 0;
	for (String s : output)
	    {
		String plusSign = " ";
		if (loc == 0)
		{
		    plusSign = "";
		}
                System.out.println(plusSign + s.trim());

		loc++;
	    }
     }
}