Migrating Chemical Structures: ISIS for Excel to JChem for Excel ConverterMigrating chemical structure data between cheminformatics tools can be deceptively complex. Moving from ISIS for Excel (also known as MDL ISIS) to JChem for Excel requires not only format conversion but careful handling of structure integrity, metadata, stereochemistry, and large spreadsheets. This article walks through why migration may be necessary, common challenges, available conversion strategies, step‑by‑step procedures, validation checks, and best practices to ensure a smooth transition.
Why migrate from ISIS for Excel to JChem for Excel?
- Modern support and maintenance: JChem for Excel is actively developed and integrated with modern cheminformatics ecosystems, whereas ISIS for Excel is legacy software and may lack current support.
- Improved performance and features: JChem offers advanced structure searching, better compatibility with contemporary chemical databases, and integration with other ChemAxon tools.
- Enterprise workflows: Organizations consolidating on ChemAxon platforms often standardize on JChem to streamline deployments, automation, and reporting.
Key challenges in conversion
- Structure representation differences: ISIS and JChem may store structure objects and attachments differently inside Excel cells (embedded objects, compressed binary formats, or SMILES/Molfile text).
- Loss of metadata: Annotations, custom properties, or column-level metadata may be stored in add-in-specific fields or hidden worksheets.
- Stereochemistry and query features: Query bonds, R/S stereocenters, and enhanced stereochemistry annotations might not translate one-to-one.
- Large spreadsheets and performance: Files with thousands of embedded structures can be slow to process; conversion tools must handle memory, batching, and error recovery.
- Version compatibility: Different versions of ISIS for Excel and JChem for Excel may affect available features and conversion behavior.
Pre-migration planning
- Inventory files: List workbooks and sheets that contain structure data. Note Excel formats (.xls, .xlsx) and approximate sizes.
- Identify structure storage method: Determine if structures are stored as embedded OLE objects, as molfiles/SMILES in cells, or as add-in-specific fields. A quick way is to inspect a sample cell: if it shows a structure image that’s not plain text, it’s likely an embedded object.
- Backup originals: Keep read-only copies of all original files.
- Define required outcomes: Decide whether you need a faithful structural match, preservation of annotations, or an opportunity to clean up data (normalize tautomers, remove salts, etc.).
- Choose a test set: Pick representative files (small, medium, large; containing stereochemistry, queries, and custom metadata) to validate the process.
Conversion strategies
- Manual export/import: Use ISIS for Excel to export structures as standard formats (SMILES, InChI, Molfile) and then import into JChem for Excel. This is straightforward for small datasets but tedious at scale.
- Automated conversion tools/scripts: Use batch scripts (VBA, Python with libraries, or command-line utilities) to extract structure files and re-import them.
- Dedicated converter utilities: Some vendors or third parties may offer conversion utilities specifically for ISIS→JChem migration. These tools often preserve embedded metadata and handle batch processing.
- Hybrid approach: Export structures to an intermediate standard (e.g., SDF), perform cleanup/normalization, then import into JChem for Excel.
Step‑by‑step conversion (recommended automated workflow)
- Install required software:
- A working copy of ISIS for Excel (for export access) or an environment that can read the files.
- JChem for Excel installed on the target machine.
- Scripting environment (Python recommended) with RDKit or OpenBabel for structure handling.
- Identify structure columns:
- Programmatically scan worksheets for cells containing OLE objects or typical ISIS add-in markers. For OLE detection in Python, use libraries that can parse Excel binary objects or use Excel automation (win32com on Windows).
- Export structures to SDF/SMILES:
- From ISIS for Excel: use the add-in export to write structures into a multi‑record SDF or a CSV with SMILES/InChI columns.
- If ISIS cannot directly export in batch, use a script to open each workbook, extract OLE objects, save them as molfile text, and write to SDF.
- Normalize and validate:
- Run the exported structures through RDKit/OpenBabel/ChemAxon tools to standardize tautomers, neutralize salts, and validate valences. Generate canonical SMILES or InChIKeys for deduplication. Example RDKit pipeline steps: sanitization, kekulization (if needed), explicit H handling, and InChI generation.
- Map metadata:
- Preserve column metadata by exporting adjacent columns directly into SDF properties or a CSV mapping file. Ensure field names don’t conflict with JChem reserved fields.
- Import into JChem for Excel:
- Use JChem for Excel’s import function to read SDF or CSV+SMILES. For large datasets, import in batches to avoid Excel memory issues.
- Alternatively, use JChem command-line or API tools (JChem CLI or ChemAxon Reactor utilities) to bulk-load data into a database or SD file, then link into Excel via JChem functions.
- Verify and reconcile:
- Spot-check critical structures (stereocenters, query features) visually in JChem for Excel. Compare InChIKeys/SMILES between source and target for automated verification.
- Check metadata columns for completeness and correct mapping.
- Finalize and archive:
- Save converted workbooks in the modern .xlsx format where possible. Keep the original files archived for audit and rollback.
Validation checklist
- Structure identity: Compare InChI/InChIKey or canonical SMILES between source and converted structures.
- Stereochemistry: Verify that chiral centers and cis/trans designations are preserved.
- Query features: Confirm whether query bonds and wildcard atoms were required — if so, review how JChem represents them and adjust.
- Metadata integrity: Ensure all custom properties and column data were carried over.
- Visual fidelity: Open a random sample of structures in the JChem drawer to visually confirm rendering.
- Count consistency: Row counts and non-empty structure cells should match pre- and post-migration.
Troubleshooting common problems
- Missing structures after import: Check whether the export step produced empty records or whether JChem import filters out invalid structures—inspect logs.
- Altered stereochemistry: If SMILES-based export loses stereochemistry, export as molfile v2000/v3000 or InChI to preserve details.
- Performance issues in Excel: Split very large datasets into multiple workbook tabs or use a chemical database backend accessed by JChem rather than storing thousands of structures in a single workbook.
- Metadata name collisions: Rename problematic columns before import or map fields explicitly during SDF creation.
Sample Python (RDKit) snippet — export SDF from SMILES CSV
All multi-line code must be in a fenced block:
# requires rdkit: conda install -c conda-forge rdkit import csv from rdkit import Chem from rdkit.Chem import AllChem, SDWriter input_csv = "structures.csv" # columns: id,smiles,prop1,prop2 out_sdf = "exported_structures.sdf" writer = SDWriter(out_sdf) with open(input_csv, newline='') as f: reader = csv.DictReader(f) for row in reader: mol = Chem.MolFromSmiles(row['smiles']) if mol is None: print("Invalid SMILES:", row['id']) continue AllChem.Compute2DCoords(mol) mol.SetProp('_Name', row['id']) for k,v in row.items(): if k not in ('id','smiles') and v: mol.SetProp(k, v) writer.write(mol) writer.close()
Best practices and recommendations
- Use standard formats (SDF, Molfile, SMILES, InChI) as intermediates — they are well-understood and preserve chemical detail better than proprietary embedded objects.
- Automate and log every step — keep detailed logs for traceability and to make rollback easier if something goes wrong.
- Keep a canonical identifier (InChIKey) for each molecule to detect duplicates and ensure identity post-migration.
- Maintain a mapping document recording how each source column maps to destination fields.
- Consider centralizing large datasets in a chemical database (JChem DB, PostgreSQL with RDKit cartridge) and using Excel as a front-end rather than the primary data store.
When to seek vendor or expert help
- If your workbooks use complex ISIS query features or custom scripting within the add-in.
- When dealing with regulatory or GLP data where absolute fidelity and audit trails are required.
- For very large enterprise migrations where downtime, validation, and integration with LIMS/ELN are critical.
Migrating from ISIS for Excel to JChem for Excel is a manageable process when planned and executed carefully. Using standard chemical formats, automated scripts, thorough validation, and adequate backups will minimize data loss and ensure a smooth transition to a modern cheminformatics platform.
Leave a Reply