(3) provide the user with a convenient class of objects where each instance can store a set of views on the same big string
(these views being typically the matches returned by a search algorithm)
4.测试
4.1 The XString class and its subsetting operator [
b <- BString(“I am a BString object”)
#@ b的内容 b
21-letter “BString” instance
seq: I am a BString object
#@ b的长度 length(b)
[1] 21
#@ A DNAString object: d <- DNAString(“TTGAAAA-CTC-N”) d
13-letter “DNAString” instance
seq: TTGAAAA-CTC-N
#@ d的长度 length(d)
[1] 13
#@ The differences with a BString object are: (1) only letters from the IUPAC extended genetic alphabet + the gap letter (-) are allowed and (2) each letter in the argument passed to the DNAString function is encoded in a special way before it’s stored in the DNAString object
Access to the individual letters:
#@ 查看d的第三个元素 d[3]
1-letter “DNAString” instance
seq: G
#@ 查看d的第7个到第12个元素 d[7:12]
6-letter “DNAString” instance
seq: A-CTC-
#@ 查看d的第1个到第3个元素 d[1:3]
3-letter “DNAString” instance
seq: TTG
#@ 查看d的所有元素 d[]
13-letter “DNAString” instance
seq: TTGAAAA-CTC-N
#@ 对比b的正向和反向排序内部元素 b[length(b):1]
21-letter “BString” instance
seq: tcejbo gnirtSB a ma I
b
21-letter “BString” instance
seq: I am a BString object
#@ Only in bounds positive numeric subscripts are supported. In fact the subsetting operator for XString objects is not efficient and one should always use the subseq method to extract a substring from a big string:
bb <- subseq(b, 3, 6)
4-letter “BString” instance
seq: am a
dd1 <- subseq(d, end=7) dd1
7-letter “DNAString” instance
seq: TTGAAAA
dd2 <- subseq(d, start=8)
6-letter “DNAString” instance
seq: -CTC-N
#@ To dump an XString object as a character vector (of length 1), use the toString method: toString(dd2)
[1] “-CTC-N”
Note that length(dd2) is equivalent to nchar(toString(dd2)) but the latter would be very inefficient on a big DNAString object.
[TODO: Make a generic of the substr() function to work with XString objects. It will be essentially doing toString(subseq()).]
4.2 The == binary operator for XString objects
#@ The 2 following comparisons are TRUE:
bb == “am a”
[1] TRUE
bb
4-letter “BString” instance
seq: am a
dd2 != DNAString(“TG”)
[1] TRUE
6-letter “DNAString” instance
seq: -CTC-N
#@ When the 2 sides of == don’t belong to the same class then the side belonging to the\lowest" class is first converted to an object belonging to the class of the other side (the \highest" class). #@ The class (pseudo-)order is character < BString < DNAString. When both sides are XString objects of the same subtype (e.g. both are DNAString objects) then the comparison is very fast because it only has to call the C standard function memcmp() and no memory allocation or string encoding/decoding is required. #@ The 2 following expressions provoke an error because the right member can’t be \upgraded" (converted) to an object of the same class than the left member:
bb == “”
Error in bb == “” :
comparison between a “BString” object and a character vector of length != 1 or an empty string or an NA is not supported
d == bb
Error in d == bb :
comparison between a “DNAString” instance and a “BString” instance is not supported
#@ When comparing an RNAString object with a DNAString object, U and T are considered equals:
r <- RNAString(d)
r
13-letter “RNAString” instance
seq: UUGAAAA-CUC-N
r == d
[1] TRUE
4.3 The XStringViews class and its subsetting operators [ and [[
#@ An XStringViews object contains a set of views on the same XString object called the subject string. Here is an XStringViews object with 4 views:
v4 <- Views(dd2, start=3:0, end=5:8) class(v4)
[1] “XStringViews”
attr(,“package”)
[1] “Biostrings”
v4
Views on a 6-letter DNAString subject
subject: -CTC-N
views:
start end width
[1] 3 5 3 [TC-]
[2] 2 6 5 [CTC-N]
[3] 1 7 7 [-CTC-N ]
[4] 0 8 9 [ -CTC-N ]
length(v4)
[1] 4
test_v <- Views(dd2, start = 4:1, end = 5:8) class(test_v)
[1] “XStringViews”
attr(,“package”)
[1] “Biostrings”
test_v
Views on a 6-letter DNAString subject
subject: -CTC-N
views:
start end width
[1] 4 5 2 [C-]
[2] 3 6 4 [TC-N]
[3] 2 7 6 [CTC-N ]
[4] 1 8 8 [-CTC-N ]
#@ Note that the 2 last views are out of limits. #@ You can select a subset of views from an XStringViews object: v4[4:2]
Views on a 6-letter DNAString subject
subject: -CTC-N
views:
start end width
[1] 0 8 9 [ -CTC-N ]
[2] 1 7 7 [-CTC-N ]
[3] 2 6 5 [CTC-N]
#@ The returned object is still an XStringViews object, even if we select only one element. #@ You need to use double-brackets to extract a given view as an XString object: v4[[2]]
5-letter “DNAString” instance
seq: CTC-N
#@ You can’t extract a view that is out of limits: v4[[3]]
Error in getListElement(x, i, …) : view is out of limits
#@ Note that, when start and end are numeric vectors and i is a single integer, Views(b, start, end)[[i]] is equivalent to subseq(b, start[i], end[i]). #@ Subsetting also works with negative or logical values with the expected semantic (the same as for R built-in vectors): v4[-3]
Views on a 6-letter DNAString subject
subject: -CTC-N
views:
start end width
[1] 3 5 3 [TC-]
[2] 2 6 5 [CTC-N]
[3] 0 8 9 [ -CTC-N ]
v4[c(TRUE, FALSE)]
Views on a 6-letter DNAString subject
subject: -CTC-N
views:
start end width
[1] 3 5 3 [TC-]
[2] 1 7 7 [-CTC-N ]
#@ Note that the logical vector is recycled to the length of v4
#@ To display all the views in v12 that are equals to a given view, you can type R cuties like: v12[v12 == v12[4]]
Views on a 8-letter DNAString subject
subject: TAATAATG
views:
start end width
[1] 1 3 3 [TAA]
[2] 4 6 3 [TAA]
v12[v12 == v12[1]]
Views on a 8-letter DNAString subject
subject: TAATAATG
views:
start end width
[1] -2 0 3 [ ]
[2] 9 11 3 [ ]
#@ This is TRUE: v12[3] == Views(RNAString(“AU”), start=0, end=2)
[1] FALSE
4.6 The start, end and width methods
start(v4)
[1] 3 2 1 0
end(v4)
[1] 5 6 7 8
width(v4)
[1] 3 5 7 9
#@ Note that start(v4)[i] is equivalent to start(v4[i]), except that the former will not issue an error if i is out of bounds (same for end and width methods). #@ Also, when i is a single integer, width(v4)[i] is equivalent to length(v4[[i]]) except that the former will not issue an error if i is out of bounds or if view v4[i] is out of limits.