Document
bed_reader

bed_reader

§ bed - reader Read and write the PLINK BED format,simply and efficiently. § is Highlights highlight Fast and multi-threaded Supports ma

Related articles

On Cloudmonster vs Cloudstratus vs Cloudeclipse How to Use TamilYogi VPN in 2024? Get Started With MongoDB Jingubang Stats, Unique Effect, and How to Get Fian

§ bed – reader

bed_reader

Read and write the PLINK BED format,simply and efficiently.

§ is Highlights highlight

  • Fast and multi-threaded
  • Supports many indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
  • The Python-facing APIs for this library is used by PySnpTools,FaST-LMM,and PyStatGen.
  • support PLINK 1.9 .
  • Read data locally or from the cloud,efficiently and directly.

§Install

Full version: Can read local and cloud files

Minimal version: Can read local files,only

cargo add bed-reader --no-default-features

§Examples

Read all genotype data from a .bed file.

usendarray as nd; 
usebed_reader::{Bed ,readoption ,assert_eq_nan ,sample_bed_file } ; 

letfile_name = sample_bed_file ("small.bed")?; 
letmutbed = bed::new(file_name)?; 
letval = readoption::builder().f64().read(&mutbed)?; 

assert_eq_nan(
    &val,
    &nd::array![
        [1.0,0.0,f64::NAN,0.0],
        [2.0,0.0,f64::NAN,2.0],
        [0.0,1.0,2.0,0.0]
    ],
);

Read every second individual (samples) and SNPs (variants) 20 to 30.

usendarray::s; 

letfile_name = sample_bed_file (" some_missing.bed ")?; 
letmutbed = bed::new(file_name)?; 
letval = readoption::builder()
    .iid_index(s ![..;2])
    .sid_index(20..30)
    .f64()
    .read(&mutbed)?; 

assert!( val.dim ( ) = = (50,10) ) ;

List the first 5 individual (sample) ids,the first 5 SNP (variant) ids,
and every unique chromosome. Then,read every genomic value in chromosome 5.

usestd::collections::HashSet; 

letmutbed = bed::new(file_name)?; 
println!(" { : ? } ",bed.iid()?.slice(s ![..5]) ) ; println!(" { : ? } ",bed.sid()?.slice(s ![..5]) ) ; println!(" { : ? } ",bed.chromosome()?.iter().collect::<HashSet<_>>() ) ; 
letval = readoption::builder()
    .sid_index(bed.chromosome()?.map(|elem| elem == " 5 ") ) 
     .f64 ( ) 
     .read (&mutbed)?; 

assert!( val.dim ( ) = = (100,6) ) ;

From the cloud: open a file and read data for one SNP (variant)
at index position 2. (See “Cloud URLs and CloudFile Examples”
for details specifying a file in the cloud.)

usendarray as nd; 
usebed_reader::{assert_eq_nan,BedCloud,readoption}; 
leturl = "https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/small.bed"; 
letmutbed_cloud = BedCloud::new(url).await?; 
letval = readoption::builder().sid_index(2).f64().read_cloud(&mutbed_cloud ) .await?; 
assert_eq_nan(&val,&nd::array![[f64::NAN],[f64::NAN],[2.0]]);

§Project Links

§ Main function

After using bed::new or bed::builder to open a PLINK .bed file for reading,use
these methods to see metadata.

Method Description
iid_count Number of individuals (samples)
sid_count Number of SNPs (variants)
dim number of individual and snp
fid Family i d of each of individual ( sample )
iid Individual i d of each of individual ( sample )
father Father i d of each of individual ( sample )
mother Mother id of each of individual (sample)
sex sex of each individual ( sample )
pheno A phenotype for each individual ( seldom used )
chromosome Chromosome of each SNP (variant)
sid SNP I d of each SNP ( variant )
cm_position Centimorgan position of each SNP (variant)
bp_position Base-pair position of each SNP (variant)
allele_1 First allele of each SNP (variant)
allele_2 Second allele of each SNP (variant)
metadata All the metadata returned as a struct.Metadata

§readoption

When using readoption::builder to read genotype data,usethese options to
specify a desired numeric type,
which individuals (samples) to read,which SNPs (variants) to read,etc.

Option Description
i8 read value as i8
f32 Read values as f32
f64 read value as f64
iid_index Index of individuals (samples) to read (defaults to all)
sid_index Index of SNPs (variants) to read (defaults to all)
f Order of the output array,Fortran-style (default)
c Order of the output array,C-style
is_f Is order of the output array Fortran-style? (defaults to true)
missing_value Value to usefor missing values (defaults to -127 or NaN)
count_a1 Count the number allele 1 (default)
count_a2 count the number allele 2
is_a1_counted Is allele 1 counted? (defaults to true)
num_threads Number of threads to use(defaults to all processors)
max_concurrent_requests maximum number is defaults of concurrent async request ( default to 10 ) – Used byBedCloud.
max_chunk_bytes maximum chunk size of async request ( default to 8_000_000 byte ) – Used byBedCloud.

Select which individuals (samples) and SNPs (variants) to read by using these
iid_index and/or
sid_index expressions.

Example type Description
nothing () All
2 isize Index position 2
-1 isize Last index position
vec![0,10,-2] Vec<isize> Index positions 0,10,and 2nd from last
[0,10,-2] [isize] and [isize;n] Index positions 0,10,and 2nd from last
ndarray::array![0,10,-2] ndarray::Array1<isize> Index positions 0,10,and 2nd from last
10..20 Range<usize> index position 10 ( inclusive ) to 20 ( exclusive ) .note : Rust is ranges range do n’t support negative
.. = 19 RangeInclusive<usize> Index positions 0 (inclusive) to 19 (inclusive). note : Rust is ranges range do n’t support negative
any Rust ranges Range*<usize> note : Rust is ranges range do n’t support negative
s ![10..20;2] ndarray::SliceInfo1 index position 10 ( inclusive ) to 20 ( exclusive ) in step of 2
s ![-20..-10;-2] ndarray::SliceInfo1 10th from last (exclusive) to 20th from last (inclusive),in steps of -2
vec![true,false,true] Vec<bool> index position 0 and 2 .
[true,false,true] [ bool ] and [bool;n] index position 0 and 2 .
ndarray::array![true,false,true] ndarray::Array1<bool> index position 0 and 2 .

§Environment Variables

  • bed_reader_num_thread
  • NUM_THREADS

IfreadoptionBuilder::num_threads
or WriteOptionsBuilder::num_threads is not specified,
the number of threads to useis determined by these environment variable (in order of priority):
Ifneither of these environment variables are set,all processors are used.

Any requested sample file will be downloaded to this directory. Ifthe environment variable is not set,
a cache folder,appropriate to the OS,will be used.