Usage and Examples

Quick Guide

Download PDB files: get
Coordinate extraction: extract
Sequence extraction: extract-seq
Chain manipulation: rename-chain, renumber-residues
Version info: version
Other: completion

pdbtk Usage

pdbtk -- a cross-platform, efficient and practical PDB structure file manipulation toolkit

Version: 0.1.1
Author: Perry
Source code: https://github.com/perry/pdbtk

pdbtk is a command-line toolkit for manipulating PDB structure files.
It provides various operations for extracting, filtering, and transforming protein structure data.

Usage:
  pdbtk [command]

Available Commands:
  get               Download a PDB file from the RCSB PDB database
  extract           Extract chains from a PDB file
  extract-seq       Extract sequences from chains in a PDB file
  rename-chain      Rename a chain in a PDB file
  renumber-residues Renumber residues in a PDB file
  version           Print the version number
  completion        Generate the autocompletion script for the specified shell
  help              Help about any command

Flags:
  -h, --help   help for pdbtk

Use "pdbtk [command] --help" for more information about a command.

get Usage

Download a PDB file from the RCSB PDB database using the PDB code.
The file will be downloaded from https://files.rcsb.org/download/{pdb_code}.{format}

By default, the file is saved as {pdb_code}.pdb in the current directory.
Use --output to specify a different filename or "-" to output to stdout.
Use --format to specify the file format (pdb, pdb.gz).

Usage:
  pdbtk get [flags] <pdb_code>

Flags:
  -f, --format string   File format: pdb, pdb.gz (default: pdb)
  -h, --help            help for get
  -o, --output string   Output file (default: {pdb_code}.{format}, use '-' for stdout)

Examples

Download 1A02 as PDB file

$ pdbtk get 1A02

Download as compressed PDB file

$ pdbtk get --format pdb.gz 1A02

Download to stdout and view the first 10 line with head

$ pdbtk get --output - 1A02 | head

Download to specific filename

$ pdbtk get --output my_structure.pdb 1A02

Download the gzipped PDB, uncompress it and extract chain B in a single command

$ pdbtk get --format pdb.gz -o - 1A02 | gunzip -c - | pdbtk extract --chains B

extract Usage

Extract specific chains from a PDB structure file.
The output can be written to a file or stdout (if no output file is specified).
If no input file is specified, reads from stdin.

Usage:
  pdbtk extract [flags] [input_file]

Flags:
  -c, --chains string   Comma-separated list of chain IDs to extract (required)
      --chain string    Alias for --chains
  -h, --help            help for extract
  -o, --output string   Output file (default: stdout)
      --altloc string   Filter by alternative location (ALTLOC) identifier (e.g., A, B) or 'first' to take first ALTLOC when duplicates exist

Examples

Extract chains A, B, and C to a file

$ pdbtk extract --chains A,B,C --output 1a02_chainABC.pdb 1a02.pdb

Extract chains A, B, and C to stdout

$ pdbtk extract --chains A,B,C 1a02.pdb > 1a02_chainABC.pdb

Extract from stdin

$ cat 1a02.pdb | pdbtk extract --chains A,B,C

Extract only ALTLOC B atoms

$ pdbtk extract --chains A --altloc B 1a02.pdb

Extract first ALTLOC when duplicates exist

$ pdbtk extract --chains A --altloc first 1a02.pdb

Extract using --chain alias

$ pdbtk extract --chain A,B,C --output 1a02_chainABC.pdb 1a02.pdb

extract-seq Usage

Extract sequences from chains in a PDB structure file.
The output is in FASTA format with sequence IDs in the format: >{pdbfilename_no_dotpdb}_{chain}

If no chains are specified, all chains will be extracted.
If no input file is specified, reads from stdin.

Usage:
  pdbtk extract-seq [flags] [input_file]

Flags:
  -c, --chains string   Comma-separated list of chain IDs to extract (default: all chains)
      --chain string    Alias for --chains
  -h, --help            help for extract-seq
  -o, --output string   Output file (default: stdout)
      --seqres          Use SEQRES records instead of ATOM records

Examples

Extract sequences from all chains

$ pdbtk extract-seq 1a02.pdb >1a02.fasta

Extract sequences from specific chains A, B, and C

$ pdbtk extract-seq --chains A,B,C 1a02.pdb >1a02_chainABC.fasta

Extract all chains to a file

$ pdbtk extract-seq --output 1a02_all.fasta 1a02.pdb

Extract from stdin

$ cat 1a02.pdb | pdbtk extract-seq --chains B,C

Extract sequences using SEQRES records

$ pdbtk extract-seq --seqres 1a02.pdb

Extract sequences using --chain alias

$ pdbtk extract-seq --chain A,B 1a02.pdb > 1a02_chainAB.fasta

Extract sequences from multiple PDB files in the current directory

$ find . -name "*.pdb" -exec pdbtk extract-seq {} \; > myseqs.fasta

Note on sequence extraction: - By default, extract-seq extracts sequences from ATOM records with gap characters (-) inserted for missing residue numbers. - Use --seqres to extract from SEQRES records instead (which contain the full sequence including regions not present in ATOM records). - If --seqres is specified but no SEQRES records are present, a warning is printed and no sequence is returned.

version Usage

Print the version number of pdbtk.

Usage:
  pdbtk version [flags]

Flags:
  -h, --help   help for version

Examples

Print the current version

$ pdbtk version
0.1.1

completion Usage

Generate the autocompletion script for the specified shell

Usage:
  pdbtk completion [command]

Available Commands:
  bash        Generate the autocompletion script for bash
  fish        Generate the autocompletion script for fish
  powershell  Generate the autocompletion script for powershell
  zsh         Generate the autocompletion script for zsh

Flags:
  -h, --help   help for completion

Use "pdbtk completion [command] --help" for more information about a command.

See download.md for more details.

rename-chain Usage

Rename a chain in a PDB structure file.
The chain ID must be a single character. The new chain ID must also be a single character.
If the specified chain does not exist, the command will exit with an error.
If the new chain ID already exists, a warning will be logged but the operation will continue.

Usage:
  pdbtk rename-chain [flags] <chain_id> [input_file]

Flags:
  -h, --help            help for rename-chain
  -o, --output string   Output file (default: stdout)
  -t, --to string       New chain ID (required)

Examples

Rename chain A to B

$ pdbtk rename-chain A --to B 1a02.pdb

Rename chain A to B and output to a file

$ pdbtk rename-chain A --to B --output 1a02_renamed.pdb 1a02.pdb

Rename chain A to B from stdin

$ cat 1a02.pdb | pdbtk rename-chain A --to B

renumber-residues Usage

Renumber residues in a PDB structure file starting from a specified number.
By default, this preserves gaps in the residue sequence but offsets the numbering.
Use --force-sequential to make all residues sequential without gaps.
Use --exclude-zero to skip residue number zero when using negative start values.

Usage:
  pdbtk renumber-residues [flags] [input_file]

Flags:
  -s, --start int          Starting residue number (can be negative) (default 1)
  -c, --chain string       Chain ID to renumber (default: all chains)
  -z, --exclude-zero       Skip residue number zero when using negative start values
  -f, --force-sequential   Force sequential numbering without gaps
  -h, --help               help for renumber-residues
  -o, --output string      Output file (default: stdout)

Examples

Renumber all residues starting from 1

$ pdbtk renumber-residues --start 1 1a02.pdb

Renumber residues in chain A starting from 1

$ pdbtk renumber-residues --start 1 --chain A 1a02.pdb

Force sequential numbering starting from 1

$ pdbtk renumber-residues --start 1 --force-sequential 1a02.pdb

Renumber starting from negative number

$ pdbtk renumber-residues --start -10 1a02.pdb

Renumber starting from -1, skipping zero (goes -1, 1, 2, 3...)

$ pdbtk renumber-residues --start -1 --exclude-zero 1a02.pdb

Renumber and output to a file

$ pdbtk renumber-residues --start 1 --output 1a02_renumbered.pdb 1a02.pdb