1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 """
28 The ilwd:char type is used to store ID strings for objects within LIGO
29 Light-Weight XML files. This module and its associated C extention module
30 _ilwd provide a class for memory-efficient storage of ilwd:char strings.
31
32 LIGO Light Weight XML "ilwd:char" IDs are strings of the form
33 "table:column:integer", for example "process:process_id:10". Large complex
34 documents can have many millions of these strings, and their storage
35 represents a significant RAM burden. However, while there can be millions
36 of ID strings in a document there might be only a small number (e.g., 10 or
37 fewer) unique ID prefixes in a document (the table name and column name
38 part). The amount of RAM required to load a document can be significantly
39 reduced if the small number of unique string prefixes are stored separately
40 and reused. This module provides the machinery used to do this.
41
42 The ilwdchar class in this module converts a string or unicode object
43 containing an ilwd:char ID into a more memory efficient representation.
44
45 Example:
46
47 >>> x = ilwdchar("process:process_id:10")
48 >>> print(x)
49 process:process_id:10
50
51 Like strings, the object resulting from this is immutable. It provides two
52 read-only attributes, "table_name" and "column_name", that can be used to
53 access the table and column parts of the original ID string. The integer
54 suffix can be retrieved by converting the object to an integer.
55
56 Example:
57
58 >>> x.table_name
59 u'process'
60 >>> int(x)
61 10
62
63 The object also provides the read-only attribute "index_offset", giving the
64 length of the string preceding the interger suffix.
65
66 Example:
67
68 >>> x.index_offset
69 19
70
71 The objects support some arithmetic operations.
72
73 Example:
74
75 >>> y = x + 5
76 >>> str(y)
77 'process:process_id:15'
78 >>> int(y - x)
79 5
80
81 The objects are pickle-able.
82
83 Example:
84
85 >>> import pickle
86 >>> x == pickle.loads(pickle.dumps(x))
87 True
88
89 To simplify interaction with documents that do not contain fully-populated
90 columns, None is allowed as an input value and is not converted.
91
92 Example:
93
94 >>> print(ilwdchar(None))
95 None
96
97
98 Implementation details
99 ======================
100
101 Memory is reduced by storing the table_name, column_name, and index_offset
102 values as class attributes, so only one copy is present in memory and is
103 shared across all instances of the class. This means that each unique
104 table_name and column_name pair requires its own class. These classes are
105 created on the fly as new IDs are processed, and get added to this module's
106 name space. They are all subclasses of _ilwd.ilwdchar, which implements
107 the low-level machinery. After a new class is created it can be accessed
108 as a symbol in this module, but each of those symbols does not exist until
109 at least one corresponding ID string has been processed.
110
111 Example:
112
113 >>> import ilwd
114 >>> "foo_bar_class" in ilwd.__dict__
115 False
116 >>> x = ilwd.ilwdchar("foo:bar:0")
117 >>> type(x)
118 <class 'glue.ligolw.ilwd.foo_bar_class'>
119 >>> "foo_bar_class" in ilwd.__dict__
120 True
121 >>> print(ilwd.foo_bar_class(10))
122 foo:bar:10
123
124 The ilwdchar class itself is never instantiated, its .__new__() method
125 parses the ID string parameter and creates an instance of the appropriate
126 subclass of _ilwd.ilwdchar, creating a new subclass before doing so if
127 neccessary.
128 """
129
130
131 import six.moves.copyreg
132
133
134 from glue import git_version
135 from . import _ilwd
136 import six
137
138
139 __author__ = "Kipp Cannon <kipp.cannon@ligo.org>"
140 __version__ = "git id %s" % git_version.id
141 __date__ = git_version.date
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
159 """
160 Searches this module's namespace for a subclass of _ilwd.ilwdchar
161 whose table_name and column_name attributes match those provided.
162 If a matching subclass is found it is returned; otherwise a new
163 class is defined, added to this module's namespace, and returned.
164
165 Example:
166
167 >>> process_id = get_ilwdchar_class("process", "process_id")
168 >>> x = process_id(10)
169 >>> str(type(x))
170 "<class 'glue.ligolw.ilwd.process_process_id_class'>"
171 >>> str(x)
172 'process:process_id:10'
173
174 Retrieving and storing the class provides a convenient mechanism
175 for quickly constructing new ID objects.
176
177 Example:
178
179 >>> for i in range(10):
180 ... print str(process_id(i))
181 ...
182 process:process_id:0
183 process:process_id:1
184 process:process_id:2
185 process:process_id:3
186 process:process_id:4
187 process:process_id:5
188 process:process_id:6
189 process:process_id:7
190 process:process_id:8
191 process:process_id:9
192 """
193
194
195
196
197 key = six.text_type(tbl_name), six.text_type(col_name)
198 cls_name = str("%s_%s_class" % key)
199 assert cls_name != "get_ilwdchar_class"
200 try:
201 return namespace[cls_name]
202 except KeyError:
203 pass
204
205
206
207
208
209 class new_class(_ilwd.ilwdchar):
210 __slots__ = ()
211 table_name, column_name = key
212 index_offset = len(u"%s:%s:" % key)
213
214 new_class.__name__ = cls_name
215
216 namespace[cls_name] = new_class
217
218
219
220
221
222 six.moves.copyreg.pickle(new_class, lambda x: (ilwdchar, (six.text_type(x),)))
223
224
225
226
227
228 return new_class
229
230
231
232
233
234
235
236
238 """
239 Metaclass wrapper of glue.ligolw._ilwd.ilwdchar class.
240 Instantiating this class constructs and returns an instance of a
241 subclass of glue.ligolw._ilwd.ilwdchar.
242 """
244 """
245 Convert an ilwd:char-formated string into an instance of
246 the matching subclass of _ilwd.ilwdchar. If the input is
247 None then the return value is None.
248
249 Example:
250
251 >>> x = ilwdchar(u"process:process_id:10")
252 >>> str(x)
253 'process:process_id:10'
254 >>> x.table_name
255 u'process'
256 >>> x.column_name
257 u'process_id'
258 >>> int(x)
259 10
260 >>> x.index_offset
261 19
262 >>> str(x)[x.index_offset:]
263 '10'
264 >>> print(ilwdchar(None))
265 None
266 """
267
268
269
270
271 if s is None:
272 return None
273
274
275
276
277
278 try:
279 table_name, column_name, i = s.strip().split(u":")
280 except (ValueError, AttributeError):
281 raise ValueError("invalid ilwd:char '%s'" % repr(s))
282
283
284
285
286
287
288 return get_ilwdchar_class(table_name, column_name)(int(i))
289