Class InterningRowBuilder
source code
object --+
|
tokenizer.RowBuilder --+
|
InterningRowBuilder
This subclass of the tokenizer.RowBuilder class respects the
"interning" hints provided by table definitions, and attempts
to replace the values of row attributes associated with interned columns
with references to shared instances of those values. This results in a
reduction in memory use which is small for most documents, but can be
subtantial when dealing with tables containing large volumes of repeated
information.
Example:
>>> class Row(object):
... pass
...
>>>
>>> rows = InterningRowBuilder(Row, ["name", "age"], ("name",))
>>> l = list(rows.append(["Dick", 20., "Jane", 75., "Dick", 22.]))
>>> l[0].name
'Dick'
>>> l[2].name
'Dick'
>>> l[2].name is l[0].name
True
Note that Python naturally interns short strings, so this example
would return True regardless; it is intended only to demonstrate the use
of the class.
The values are stored in a dictionary that is shared between all
instances of this class, and which survives forever. Nothing is ever
naturally "uninterned", so the string dictionary grows without
bound as more documents are processed. This can be a problem in some use
cases, and the work-around is to run
>>> InterningRowBuilder.strings.clear()
to reset the dictionary at appropriate points in the application.
Typically this would be done immediately after each document is
loaded.
|
append(self,
tokens)
Append a sequence of tokens to the row builder, returning an iterator
for generating a sequence of new row instances. |
source code
|
|
Inherited from tokenizer.RowBuilder :
__init__ ,
__iter__ ,
__new__ ,
next
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|
Append a sequence of tokens to the row builder, returning an iterator
for generating a sequence of new row instances. The tokens argument
should be an iterable, producing a sequence of token objects. If fewer
tokens are yielded from the iterable than are required to construct a
complete row, then the row is stored in its partially-populated state and
its construction will continue upon the next invocation. Note that it is
possible that a call to this method will yield no new rows at all.
Example:
>>> for row in rows.append([10, 6.8, 15, 29.1]):
... print row.snr
...
- Overrides:
tokenizer.RowBuilder.append
- (inherited documentation)
|