Quickly iterating over large databases

I am working with a dataset in Excel that will regularly contain over 10,000 records. Given the size of the dataset, I do not want to read the entire set into a variable but rather access one record at a time. I currently use a column in Excel (RecNumber) that stores a one-up numbering system, and I use a where to pull in the record I want:
set i to 1
set z to the number of records in myDB
Repeat while i <= z
set myCurrentRecord to the record of myDB where RecNumber = i
set i=i+1
End Repeat

What seems to happen is that the script will take longer as i gets larger.

My questions:

  • Would I get a performance boost by moving to a database that has a true primary key? (e.g. Access, SQL Server)
  • Is there a way to iterate of records based on their record number with code like this:
    set myCurrentRecord to record i of myDB

Thanks, Dave

Hi DaveHester

I recon on a real Db you will get a perfomance boost. Because Excel it self is not really performant with large data sets. I recon the performance break in is not of reading data into the memory of smalltalk, I guess it is caused by excel it self.

On other hand you can try with normal excel access methods of smalltalk. When I read from excel I get the hole column I have the “primary key”, get the item index of these or repeat through all indexes and get the hole row.

set wbObject to Workbook("C:\myExcel.xlsx")
set wsObject to wbObject.worksheet("mySheet")
put wsObject.CellRange("A") into myCols
put the number of items of myCol into countCols
repeat with colNo = 1 to countCols
    put first item of wsObject.CellRange(colNo) into myRow
end repeat

Hey, Dave. We use the Primary Key “Hidden Column” in Excel to go to the record we want, not sure if it would speed things up for you though, but you don’t have to create a custom column called RecNumber.

Working with Excel (eggplantsoftware.com)

When an Excel connection is Writable, any records fetched will include a pkey property property value. SenseTalk uses this property value to uniquely identify each record that has changes and that should be written to the file.

Here is an example of how we use a sheet with a bunch of test patient data to debug a problem record.

//Setup myExcelDB data connection
Set myExcelDB = (type: "excel", file: ResourcePath("EpicData.xlsx"), writeable: "Yes")
Set myTable = table("TestPats") of myExcelDB

//Debug logic to run a specific record in TestPats
set myPK = ""  --set to _pk_ row# ex. 1001 is row 1

//Subset your records to the _pk_
set myPatients  = the records of myTable where _pk_ = myPK

Here is how you could use it. Remember to make your data connection writeable.

set i to 1001
set z to the number of records in myDB
set z to z + 1000 --need to add 1000 since _pk_ records start at 1001
Repeat while i <= z
set myCurrentRecord to the record of myDB where _pk_ = i
set i=i+1
End Repeat