Deprecated Python Script API

This section lists the API of the module knime_io that functioned as the main contact point between KNIME and Python in the KNIME Python Script node in KNIME AP before version 4.7, when the Python Script node was moved out of Labs. Please refer to the KNIME Python Integration Guide for more details on how to set up and use the node.

Warning

This API is deprecated since KNIME AP 4.7, please use the current API as described in Python Script API

Inputs and outputs

These properties can be used to retrieve data from or pass data back to KNIME Analytics Platform. The length of the input and output lists depends on the number of input and output ports of the node.

Example: If you have a Python Script node configured with two input tables and one input object, you can access the two tables via knime_io.input_tables[0] and knime_io.input_tables[1], and the input object via knime_io.input_objects[0].

knime_io.flow_variables: Dict[str, Any] = {}

A dictionary of flow variables provided by the KNIME workflow. New flow variables can be added to the output of the node by adding them to the dictionary. Supported flow variable types are numbers, strings, booleans and lists thereof.

knime_io.input_objects: List = <knime.scripting._io_containers._FixedSizeListView object>

A list of input objects of this script node using zero-based indices. This list has a fixed size, which is determined by the number of input object ports configured for this node. Input objects are Python objects that are passed in from another Python script node’s``output_object`` port. This can, for instance, be used to pass trained models between Python nodes. If no input is given, the list exists but is empty.

knime_io.input_tables: List[ReadTable] = <knime.scripting._io_containers._FixedSizeListView object>

The input tables of this script node. This list has a fixed size, which is determined by the number of input table ports configured for this node. Tables are available in the same order as the port connectors are displayed alongside the node (from top to bottom), using zero-based indexing. If no input is given, the list exists but is empty.

knime_io.output_images: List = <knime.scripting._io_containers._FixedSizeListView object>

The output images of this script node. This list has a fixed size, which is determined by the number of output images configured for this node. The value passed to the output port should be an array of bytes encoding an SVG or PNG image.

Example:

data = knime_io.input_tables[0].to_pandas()
buffer = io.BytesIO()

pyplot.figure()
pyplot.plot('x', 'y', data=data)
pyplot.savefig(buffer, format='svg')

knime_io.output_images[0] = buffer.getvalue()
knime_io.output_objects: List = <knime.scripting._io_containers._FixedSizeListView object>

The output objects of this script node. This list has a fixed size, which is determined by the number of output object ports configured for this node. Each output object can be an arbitrary Python object as long as it can be pickled. Use this to, for example, pass a trained model to another Python script node.

Example:

model = torchvision.models.resnet18()
...
# train/finetune model
...
knime_io.output_objects[0] = model
knime_io.output_tables: List[WriteTable] = <knime.scripting._io_containers._FixedSizeListView object>

The output tables of this script node. This list has a fixed size, which is determined by the number of output table ports configured for this node. You should assign a WriteTable or BatchWriteTable to each output port of this node. See the factory methods knime_io.write_table() and knime_io.batch_write_table() below.

Example:

knime_io.output_tables[0] = knime_io.write_table(my_pandas_df)

Factory methods

Use these methods to fill the knime_io.output_tables.

knime_io.batch_write_table() BatchWriteTable

Factory method to create an empty BatchWriteTable that can be filled sequentially batch by batch (see Example).

Example:

table = knime_io.batch_write_table()
table.append(df_1)
table.append(df_2)
knime_io.output_tables[0] = table

Warning

This class is deprecated since KNIME AP 4.7, use knime.api.table.BatchOutputTable.create() instead.

knime_io.write_table(data: ReadTable | pandas.DataFrame | pyarrow.Table, sentinel: str | int | None = None) WriteTable

Factory method to create a WriteTable given a pandas.DataFrame or a pyarrow.Table. If the input is a pyarrow.Table, its first column must contain unique row identifiers of type ‘string’.

Example:

knime_io.output_tables[0] = knime_io.write_table(my_pandas_df, sentinel="min")
Parameters:
  • data – A ReadTable, pandas.DataFrame or a pyarrow.Table

  • sentinel

    Interpret the following values in integral columns as missing value:

    • "min" min int32 or min int64 depending on the type of the column

    • "max" max int32 or max int64 depending on the type of the column

    • a special integer value that should be interpreted as missing value

Warning

This method is deprecated since KNIME AP 4.7, use knime.api.table.Table.from_pandas() or knime.api.table.Table.from_pyarrow() instead.

Classes

class knime.scripting._deprecated._table.Batch

A batch is a part of a table containing data. A batch should always fit into system memory, thus all methods accessing the data will be processed immediately and synchronously.

It can be sliced before the data is accessed as pandas.DataFrame or pyarrow.RecordBatch.

__getitem__(slicing: slice | Tuple[slice, slice | List[int] | List[str]]) SlicedDataView

Creates a view of this batch by slicing specific rows and columns. The slicing syntax is similar to that of numpy arrays, but columns can also be addressed as index lists or via a list of column names.

Parameters:
  • row_slice – A slice object describing which rows to use.

  • column_slice – Optional. A slice object, a list of column indices, or a list of column names.

Returns:

A SlicedDataView that can be converted to pandas or pyarrow.

Example:

full_batch = batch[:] # Slice/Get the full batch

# Slicing works for rows and columns. Column slices can be defined with int's or the column names
row_sliced_batch = batch[:100] # Get first 100 rows of the batch
column_sliced_batch = batch[:, ["name", "age"]] # Get all rows of the columns "name" and "age"
row_and_column_sliced_batch = batch[:100, 1:5] # Get the first 100 rows of columns 1,2,3,4

# The resulting`sliced_batches` cannot be sliced further. But they can be converted to pandas or pyarrow.
abstract property column_names: Tuple[str, ...]

Returns the list of column names.

abstract property num_columns: int

Returns the number of columns in the table.

abstract property num_rows: int

Returns the number of rows in the table.

If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.

property shape: Tuple[int, int]

Returns a tuple in the form (numRows, numColumns) representing the shape of this table.

If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.

abstract to_pandas(sentinel: str | int | None = None) pandas.DataFrame

Access the batch or table as a pandas.DataFrame.

Parameters:

sentinel

Replace missing values in integral columns by the given value, one of:

  • "min" min int32 or min int64 depending on the type of the column

  • "max" max int32 or max int64 depending on the type of the column

  • An integer value that should be inserted for each missing value

Raises:

IndexError – If rows or columns were requested outside of the available shape

abstract to_pyarrow(sentinel: str | int | None = None) pyarrow.RecordBatch | pyarrow.Table

Access this batch or table as a pyarrow.RecordBatch or pyarrow.table. The returned type depends on the type of the underlying object. When called on a ReadTable, returns a pyarrow.Table.

Parameters:

sentinel

Replace missing values in integral columns by the given value, one of:

  • "min" min int32 or min int64 depending on the type of the column

  • "max" max int32 or max int64 depending on the type of the column

  • An integer value that should be inserted for each missing value

Raises:

IndexError – If rows or columns were requested outside of the available shape

class knime.scripting._deprecated._table.ReadTable

A KNIME ReadTable provides access to the data provided from KNIME, either in full (must fit into memory) or split into row-wise batches.

Warning

This class is deprecated since KNIME AP 4.7, use knime.api.table.Table instead.

__getitem__(slicing: slice | Tuple[slice, slice | List[int] | List[str]]) SlicedDataView

Creates a view of this ReadTable by slicing rows and columns. The slicing syntax is similar to that of numpy arrays, but columns can also be addressed as index lists or via a list of column names.

The returned sliced_table cannot be sliced further. But they can be converted to pandas or pyarrow.

Parameters:
  • row_slice – A slice object describing which rows to use.

  • column_slice – Optional. A slice object, a list of column indices, or a list of column names.

Returns:

a SlicedDataView that can be converted to pandas or pyarrow.

Example:

row_sliced_table = table[:100] # Get the first 100 rows
column_sliced_table = table[:, ["name", "age"]] # Get all rows of the columns "name" and "age"
row_and_column_sliced_table = table[:100, 1:5] # Get the first 100 rows of columns 1,2,3,4

df = row_and_column_sliced_table.to_pandas()
__len__() int

Returns the number of batches of this table

abstract batches() Iterator[Batch]

Returns an generator for the batches in this table. If the generator is advanced to a batch that is not available yet, it will block until the data is present. len(my_read_table) gives the static amount of batches within the table, which is not updated.

Example:

processed_table = knime_io.batch_write_table()
for batch in knime_io.input_tables[0].batches():
    input_batch = batch.to_pandas()
    # process the batch
    processed_table.append(input_batch)
abstract property column_names: Tuple[str, ...]

Returns the list of column names.

abstract property num_batches: int

Returns the number of batches in this table.

If the table is not completely available yet because batches are still appended to it, querying the number of batches blocks until all data is available.

abstract property num_columns: int

Returns the number of columns in the table.

abstract property num_rows: int

Returns the number of rows in the table.

If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.

property shape: Tuple[int, int]

Returns a tuple in the form (numRows, numColumns) representing the shape of this table.

If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.

abstract to_pandas(sentinel: str | int | None = None) pandas.DataFrame

Access the batch or table as a pandas.DataFrame.

Parameters:

sentinel

Replace missing values in integral columns by the given value, one of:

  • "min" min int32 or min int64 depending on the type of the column

  • "max" max int32 or max int64 depending on the type of the column

  • An integer value that should be inserted for each missing value

Raises:

IndexError – If rows or columns were requested outside of the available shape

abstract to_pyarrow(sentinel: str | int | None = None) pyarrow.RecordBatch | pyarrow.Table

Access this batch or table as a pyarrow.RecordBatch or pyarrow.table. The returned type depends on the type of the underlying object. When called on a ReadTable, returns a pyarrow.Table.

Parameters:

sentinel

Replace missing values in integral columns by the given value, one of:

  • "min" min int32 or min int64 depending on the type of the column

  • "max" max int32 or max int64 depending on the type of the column

  • An integer value that should be inserted for each missing value

Raises:

IndexError – If rows or columns were requested outside of the available shape

class knime.scripting._deprecated._table.WriteTable

A table that can be filled as a whole.

Warning

This class is deprecated since KNIME AP 4.7, use knime.api.table.Table instead.

abstract property column_names: Tuple[str, ...]

Returns the list of column names.

abstract property num_batches: int

Returns the number of batches in this table.

If the table is not completely available yet because batches are still appended to it, querying the number of batches blocks until all data is available.

abstract property num_columns: int

Returns the number of columns in the table.

abstract property num_rows: int

Returns the number of rows in the table.

If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.

property shape: Tuple[int, int]

Returns a tuple in the form (numRows, numColumns) representing the shape of this table.

If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.

class knime.scripting._deprecated._table.BatchWriteTable

A table that can be filled batch by batch.

Warning

This class is deprecated since KNIME AP 4.7, use knime.api.table.BatchOutputTable instead.

abstract append(data: Batch | pandas.DataFrame | pyarrow.RecordBatch, sentinel: str | int | None = None)

Appends a batch with the given data to the end of this table. The number of columns, as well as their data types, must match that of the previous batches in this table. Note that this cannot take a pyarrow.Table as input. With pyarrow, it can only process batches, which can be created as follows from some input table.

Example:

processed_table = knime_io.batch_write_table()
for batch in knime_io.input_tables[0].batches():
    input_batch = batch.to_pandas()
    # process the batch
    processed_table.append(input_batch)
Parameters:
  • data – A batch, a pandas.DataFrame or a pyarrow.RecordBatch

  • sentinel

    Only if data is a pandas.DataFrame or pyarrow.RecordBatch. Interpret the following values in integral columns as missing value:

    • "min" min int32 or min int64 depending on the type of the column

    • "max" max int32 or max int64 depending on the type of the column

    • a special integer value that should be interpreted as missing value

Raises:

ValueError – If the new batch does not have the same columns as previous batches in this Writetable.

abstract property column_names: Tuple[str, ...]

Returns the list of column names.

static create() BatchWriteTable

Create an empty BatchWriteTable

abstract property num_batches: int

Returns the number of batches in this table.

If the table is not completely available yet because batches are still appended to it, querying the number of batches blocks until all data is available.

abstract property num_columns: int

Returns the number of columns in the table.

abstract property num_rows: int

Returns the number of rows in the table.

If the table is not completely available yet because batches are still appended to it, querying the number of rows blocks until all data is available.

property shape: Tuple[int, int]

Returns a tuple in the form (numRows, numColumns) representing the shape of this table.

If the table is not completely available yet because batches are still appended to it, querying the shape blocks until all data is available.