Document
bed_reader

bed_reader

§ bed - reader Read and write the PLINK BED format,simply and efficiently. § is Highlights highlight Fast and multi-threaded Supports ma

Related articles

How to Setup an L2TP/IPsec VPN Client on Linux How to Draw a Thunderstorm Final Fantasy VII (Original) Equippable Weapons Guide Undulatus clouds look like wavy rows How to install a VPN on TP-Link Wi-Fi Router(For expressVPN,nordvpn, etc)

§ bed – reader

bed_reader

Read and write the PLINK BED format,simply and efficiently.

§ is Highlights highlight

  • Fast and multi-threaded
  • Supports many indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
  • The Python-facing APIs for this library is used by PySnpTools,FaST-LMM,and PyStatGen.
  • support PLINK 1.9 .
  • Read data locally or from the cloud,efficiently and directly.

§Install

Full version: Can read local and cloud files

Minimal version: Can read local files,only

cargo add bed-reader --no-default-features

§Examples

Read all genotype data from a .bed file.

usendarray as nd; 
usebed_reader::{Bed ,readoption ,assert_eq_nan ,sample_bed_file } ; 

letfile_name = sample_bed_file ("small.bed")?; 
letmutbed = bed::new(file_name)?; 
letval = readoption::builder().f64().read(&mutbed)?; 

assert_eq_nan(
    &val,
    &nd::array![
        [1.0,0.0,f64::NAN,0.0],
        [2.0,0.0,f64::NAN,2.0],
        [0.0,1.0,2.0,0.0]
    ],
);

Read every second individual (samples) and SNPs (variants) 20 to 30.

usendarray::s; 

letfile_name = sample_bed_file (" some_missing.bed ")?; 
letmutbed = bed::new(file_name)?; 
letval = readoption::builder()
    .iid_index(s ![..;2])
    .sid_index(20..30)
    .f64()
    .read(&mutbed)?; 

assert!( val.dim ( ) = = (50,10) ) ;

List the first 5 individual (sample) ids,the first 5 SNP (variant) ids,
and every unique chromosome. Then,read every genomic value in chromosome 5.

usestd::collections::HashSet; 

letmutbed = bed::new(file_name)?; 
println!(" { : ? } ",bed.iid()?.slice(s ![..5]) ) ; println!(" { : ? } ",bed.sid()?.slice(s ![..5]) ) ; println!(" { : ? } ",bed.chromosome()?.iter().collect::<HashSet<_>>() ) ; 
letval = readoption::builder()
    .sid_index(bed.chromosome()?.map(|elem| elem == " 5 ") ) 
     .f64 ( ) 
     .read (&mutbed)?; 

assert!( val.dim ( ) = = (100,6) ) ;

From the cloud: open a file and read data for one SNP (variant)
at index position 2. (See “Cloud URLs and CloudFile Examples”
for details specifying a file in the cloud.)

usendarray as nd; 
usebed_reader::{assert_eq_nan,BedCloud,readoption}; 
leturl = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/small.bed"; 
letmutbed_cloud = BedCloud::new(url).await?; 
letval = readoption::builder().sid_index(2).f64().read_cloud(&mutbed_cloud ) .await?; 
assert_eq_nan(&val,&nd::array![[f64::NAN],[f64::NAN],[2.0]]);

§Project Links

§ Main function

After using bed::new or bed::builder to open a PLINK .bed file for reading,use
these methods to see metadata.

Method Description
iid_count Number of individuals (samples)
sid_count Number of SNPs (variants)
dim number of individual and snp
fid Family i d of each of individual ( sample )
iid Individual i d of each of individual ( sample )
father Father i d of each of individual ( sample )
mother Mother id of each of individual (sample)
sex sex of each individual ( sample )
pheno A phenotype for each individual ( seldom used )
chromosome Chromosome of each SNP (variant)
sid SNP I d of each SNP ( variant )
cm_position Centimorgan position of each SNP (variant)
bp_position Base-pair position of each SNP (variant)
allele_1 First allele of each SNP (variant)
allele_2 Second allele of each SNP (variant)
metadata All the metadata returned as a struct.Metadata

§readoption

When using readoption::builder to read genotype data,usethese options to
specify a desired numeric type,
which individuals (samples) to read,which SNPs (variants) to read,etc.

Option Description
i8 read value as i8
f32 Read values as f32
f64 read value as f64
iid_index Index of individuals (samples) to read (defaults to all)
sid_index Index of SNPs (variants) to read (defaults to all)
f Order of the output array,Fortran-style (default)
c Order of the output array,C-style
is_f Is order of the output array Fortran-style? (defaults to true)
missing_value Value to usefor missing values (defaults to -127 or NaN)
count_a1 Count the number allele 1 (default)
count_a2 count the number allele 2
is_a1_counted Is allele 1 counted? (defaults to true)
num_threads Number of threads to use(defaults to all processors)
max_concurrent_requests maximum number is defaults of concurrent async request ( default to 10 ) – Used byBedCloud.
max_chunk_bytes maximum chunk size of async request ( default to 8_000_000 byte ) – Used byBedCloud.

Select which individuals (samples) and SNPs (variants) to read by using these
iid_index and/or
sid_index expressions.

Example type Description
nothing () All
2 isize Index position 2
-1 isize Last index position
vec![0,10,-2] Vec<isize> Index positions 0,10,and 2nd from last
[0,10,-2] [isize] and [isize;n] Index positions 0,10,and 2nd from last
ndarray::array![0,10,-2] ndarray::Array1<isize> Index positions 0,10,and 2nd from last
10..20 Range<usize> index position 10 ( inclusive ) to 20 ( exclusive ) .note : Rust is ranges range do n’t support negative
.. = 19 RangeInclusive<usize> Index positions 0 (inclusive) to 19 (inclusive). note : Rust is ranges range do n’t support negative
any Rust ranges Range*<usize> note : Rust is ranges range do n’t support negative
s ![10..20;2] ndarray::SliceInfo1 index position 10 ( inclusive ) to 20 ( exclusive ) in step of 2
s ![-20..-10;-2] ndarray::SliceInfo1 10th from last (exclusive) to 20th from last (inclusive),in steps of -2
vec![true,false,true] Vec<bool> index position 0 and 2 .
[true,false,true] [ bool ] and [bool;n] index position 0 and 2 .
ndarray::array![true,false,true] ndarray::Array1<bool> index position 0 and 2 .

§Environment Variables

  • bed_reader_num_thread
  • NUM_THREADS

IfreadoptionBuilder::num_threads
or WriteOptionsBuilder::num_threads is not specified,
the number of threads to useis determined by these environment variable (in order of priority):
Ifneither of these environment variables are set,all processors are used.

Any requested sample file will be downloaded to this directory. Ifthe environment variable is not set,
a cache folder,appropriate to the OS,will be used.