Show pageOld revisionsBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Installing HTSeq ====== HTSeq is a tool for "Analysing high-throughput sequencing data with Python". This tool consists of a python module to import as well as two wrapper scripts to execute on the command line. Here we show how to get this tool and install it as a VALET package for your research group. The commands to setup, install and create the VALET package are in bash script files: * **t1_setup** - Set shell variable and setup all necessary directories (downloading as needed). * **t2_install** - Install the python module an install in in your research software directory. * **t3_valet** - Write a VALET package file to set up the environment to use HTSeq. Once you have Bash source files to do these task, you can setup, intall and create the VALET package file with the one compound command: <code bash> . t1_setup && . t2_install && . t3_valet </code> These files serve to document how the package was built, and can be used to build a new version or rebuild the old one. <note tip> The Mills work directory for your research group in on the ''/lustre'' file system and it not backed up. These three script files are quite small, and will rebuild all directories. Keep these files on a system that is backed up, for example, your home directory on Mills. </note> <note warning>These scripts are sourced. This means that all variables set, and all VALET packages added will be in your environment on completion. In fact, this is used to pass setup variables to the install and valet scripts. This is also why there is the ''vpkg_rollback all'' command before any VALET commands in the install script</note> ===== Task 1: setup ===== The bash source file ''t1_setup'' has shell variables assignments and a few directory "setup" commands. The variable assignments define the scratch, installation and VALET directories. Their other values will be used to write the VALET package file. There are two environment variables used in this source file: ''USER'' and ''WORKDIR''. Since ''WORKDIR'' may not be set, you should source these files from a workgroup shell. When in a workgroup shell the directory ''${WORKDIR}/sw/valet'' has a special meaning. It is searched by VALET for vpkg files. You may use a different directory by adding it to your ''VALET_SYSCONFDIR'' environment variable. <file bash "t1_setup"> # package variables: id=htseq version=0.5.3p9 description='Analysing high-throughput sequencing data with Python' binaries="htseq-count htseq-qa" url_doc='http://www-huber.embl.de/users/anders/HTSeq/doc/' dir="HTSeq-$version" url_get="http://pypi.python.org/packages/source/H/HTSeq/$dir.tar.gz" # scratch variables: scratch="/lustre/scratch/$USER/$id" alias clean="lrm -r $scratch" # install variables: compiler='gcc/4.6.2' require='numpy/1.6.1-2.7' prefix="${WORKDIR}/sw/$id" installhome="$prefix/$version" pythondir="$installhome/lib/python" # VALET variables. valetdir="${WORKDIR}/sw/valet" test -z "$WORKDIR" && echo "WORKDIR not set, start a workgroup shell." && return 1 # setup directories mkdir -p -m 1777 "$valetdir" mkdir -p "$installhome/bin" "$pythondir" mkdir -p "$scratch" test -d "$scratch/$dir" || curl "$url_get" | tar -zxf - -C "$scratch" </file> Directory setup notes: === Call-by-need evaluation === Bash uses call-by-need evaluation (also called [[http://en.wikipedia.org/wiki/Lazy_evaluation|lazy evaluation]]). For example: <code bash> test -d "$scratch/$dir" || curl "$url_get" | tar -zxf - -C "$scratch" </code> The statement after the **or** operator ''||'' is only evaluated if the test statement fails. If the directory does not exist it will be downloaded, uncompressed, extracted to the scratch directory. This will create the "$scratch/$dir" directory, so the next time you source this file it will not download and extract the file. === The -p option on mkdir === Normally, the ''mkdir'' command fails, with an error message, when the directory already exists. The ''-p'' (or <nowiki>--</nowiki>''parent'') option causes ''mkdir'' to exit normally when the directory exists, otherwise it creates it along with all parent directories as needed. You will get an error if one of the parent directories does not exist and you do not have permission to create it. The error message: <code text> mkdir: cannot create directory `/lustre/work/it_css/sw/htseq/': Permission denied </code> refers to a parent directory, which is not permitted properly. If you do not own the directory, it should have the sticky bit set. === The -m 1777 option on mkdir === Normally, any new directory will be readable and searchable by everybody, but writable only by you, as the owner of the directory. This is a good choice for the install directories, since it means your group can use the installed package, but only you can rebuild or maintain the package. However, directories that are shared and writable should have the sticky bit set. The sticky bit on a directory (also called the restriction deletion flag) prevents an unprivileged user from removing or renaming a file in the directory unless they own the file or the directory. See **''man chmod''**. The ''-m 1777'' option on ''mkdir'' will set each newly created directory to be group writable with restricted deletion. ===== Task 2: install ===== This bash source file ''t2_install'' assumes the setup task file was sourced. <file bash "t2_install"> # $scratch/$dir set scratch directory for installation # $require and $compiler set to VALET dependent package # $pythondir set to Python install directory # $installhome set to pakcage install dirrectory pushd $scratch/$dir vpkg_rollback all vpkg_require $require $compiler export PYTHONPATH=$PYTHONPATH:$pythondir python setup.py install --home=$installhome popd </file> Installation notes: === pushd .. popd === The installation is done in a scratch directory. The ''pushd'' changes to that directory after pushing the PWD on a stack. The ''popd'' brings you back to the CWD. When done there is a ''clean'' alias set in ''t1_setup'' to remove the scratch directory. === vpkg commands === These installation command come from the HTSeq installation instructions. The instructions to download and install a dependent package is skipped. The command ''vpkg_require'' is used to add packages that do not need installation. Make sure you read the instructions carefully. It is best to install from source and you can't install in an area which requires root (or sudo) access. ===== Task 3: valet ===== The Bash source file ''t3_valet'' assumes the shell variables have been assigned in the setup source file, and it contains just one ''cat'' command to write out a complete VALET package file with one version. <file bash "t3_valet"> # $valetdir set to VALET directory (should be in $VALET_SYSCONFDIR) # $id set to VALET package-id # $description set to package description # $binaries set to a list of the package executables # $prefix set to prefix for install location # $prefix/$version is set to the install location # $require and $compiler set to VALET dependent packages # $pythondir set to the package python directory. cat > "$valetdir/$id.vpkg" <<EOT <?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.udel.edu/xml/valet/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.udel.edu/xml/valet/1.0 http://www.udel.edu/xml/valet/1.0/schema.xsd" id="$id"> <description>$description</description> <url>$url_doc</url> <prefix>$prefix</prefix> <version id="$version"> <description>$binaries</description> <dependencies> <package id="$require"/> <package id="$compiler"/> </dependencies> <export variable="PYTHONPATH" action="path-append">$pythondir</export> </version> </package> EOT </file> VALET file creation notes: === here document redirect === This script contains just one ''cat'' command with a [[http://en.wikipedia.org/wiki/Here_document#Unix_shells|here-document]] redirect. The resulting htseq.vpkg file will contain the lines in the source (between ''<<EOT'' and ''EOT'') with the shell variables expanded to the values set in t1_setup. In many cases, the names of the variables match the place in the vpkg file. === XML documents === VALET uses [[http://www.w3schools.com/xml/| XML]] to describe the package, which may contain several versions. It also uses an [[http://www.w3schools.com/schema/|XML schema]] to describe the structure of an XML document. === More than one version === To install a new version, you should first copy this vpkg file to a safe location. Then change the ''t1_setup'' script to set variable for the new version. Remember to check for a new URL location and new requirements. Then run all three steps. Finally merge the saved vpkg file with the new vpgk file, to use both versions. The first version is the default. ====== Testing HTSeq ====== ===== VALET versions ===== When the htseq VALET package is installed and VALET_SYSCONFIG is set, the two commands <code> workgroup -g it_css vpkg_versions htseq </code> should yield: <code> Available versions in package (* = default version): [/lustre/work/it_css/sw/valet/htseq.vpkg] htseq HTSeq: Analysing high-throughput sequencing data with Python * 0.5.3p9 htseq-count htseq-qa </code> <note warning>**ERROR: unknown package: htseq** You will get this error message, if you are not in a workgroup shell, i.e., $WORKDIR is not set, or you installed the VALET package in different directory. You can use any directory, but you must configure VALET to use your directory. For example, add the lines to your ''.bash_profile'': <code> #Set my VALET config directories, to find vpkg files. export VALET_SYSCONFDIR=/archive/it_css/sw/valet:~trainf/.valet </code> </note> ===== HTSeq binaries ===== There are two binaries in the HTSeq. To test to see if ''htseq-count'' runs: <code> vpkg_require htseq htseq-count --help </code> The binary ''htseq-qa'' needs an additional library (The plots are output as a PDF file.) So to test: <code> vpkg_require htseq matplotlib htseq-qa --help </code> <note warning>**This script needs the 'matplotlib' library, which was not found. Please install it.** Without matplotlib added to your environment, you get this message. </note> <note tip> You may want to add ''matplotlib'' as a dependency to your ''htseq'' package. It was not mentioned on the htseq web page as a dependency, but it is need for ''htseq-qa'' (and the tour using python). At this point, it is easy to just add to the ''htseq.vpkg'' file. Between ''<dependencies>'' and ''</dependencies>'' add the line: <code> <package id="matplotlib/1.1.0-2.7.2"/> </code> It would be better to go back and add to the install task scripts: to ''t1_setup'' add: <code> plotrequire='matplotlib/1.1.0-2.7.2' </code> to ''t3_valet'' add: <code> <package id="$plotrequire"/> </code> and then rebuild the valet package file with the bash commands <code> . t1_setup && . t3_valet </code> Clearly, these three steps are more than just a simple change to one file, but this documents what you did, and you will not forget to add this change for the next version. </note> ===== A tour through HTSeq ===== See [[http://www-huber.embl.de/users/anders/HTSeq/doc/tour.html|A tour through HTSeq]] for a tour, which demonstrates the functionality of HTSeq. To follow along on a Mills compute node, you must download the example data and add the VALET packages ''htseq'' and ''matplotlib'' (The demonstrations uses ''pyplot'' in the ''matplotlib'' module.) <code bash> qlogin export VALET_SYSCONFDIR=$WORKDIR/sw/valet vpkg_require htseq matplotlib curl http://www-huber.embl.de/users/anders/HTSeq/HTSeq_example_data.tgz | tar -zxf - </code> Following the example from the HTSeq tour site: <code> [dnairn@n015 ~]$ python Python 2.7.6 (default, Feb 12 2014, 12:13:46) [GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import HTSeq >>> fastq_file = HTSeq.FastqReader( "yeast_RNASeq_excerpt_sequence.txt", "solexa" ) >>> fastq_file <FastqReader object, connected to file name 'yeast_RNASeq_excerpt_sequence.txt'> >>> import itertools >>> for read in itertools.islice( fastq_file, 10 ): ... print read ... CTTACGTTTTCTGTATCAATACTCGATTTATCATCT AATTGGTTTCCCCGCCGAGACCGTACACTACCAGCC TTTGGACTTGATTGTTGACGCTATCAAGGCTGCTGG ATCTCATATACAATGTCTATCCCAGAAACTCAAAAA AAAGTTCGAATTAGGCCGTCAACCAGCCAACACCAA GGAGCAAATTGCCAACAAGGAAAGGCAATATAACGA AGACAAGCTGCTGCTTCTGTTGTTCCATCTGCTTCC AAGAGGTTTGAGATCTTTGACCACCGTCTGGGCTGA GTCATCACTATCAGAGAAGGTAGAACATTGGAAGAT ACTTTTAAAGATTGGCCAAGAATTGGGGATTGAAGA >>> ... </code> software/python-htseq/python-htseq.txt Last modified: 2017-10-23 18:03by sraskar