Current Limitations of Anovos
The current 1.0 release of Anovos still has some limitations, which we will address in the upcoming releases. To learn more about what's on the horizon, check out our roadmap.
🔣 Data
-
Anovos currently supports numerical, categorical, geospatial, and datetime/timestamp columns (at the cross-sectional and transactional level). We plan to add support for additional data types such as (struct) arrays in the future.
-
Anovos currently relies on Apache Spark's automatic schema detection. In case some numerical columns were deliberately saved as string, they will show up as categorical columns when loaded into a DataFrame (except for CSV files).
🏎 Performance
-
Computing the mode and/or distinct value counts are the most expensive operations in Anovos. We aim to further optimize them in the upcoming releases.
-
Correlation matrix only supports numerical data. Support for categorical data has been removed due to performance concerns and will return in a later release.
-
The invalid entries detection may yield false positives. Hence, be cautious when using the inbuilt treatment option.
-
The categorical encoding functions
cat_to_num_supervised
andcat_to_num_unsupervised
may exhibit poor performance and scaling behavior with very high-cardinality columns. Therefore, it is recommended to reduce cardinality before subjecting them to encoding or specifying an appropriate threshold to drop them from the analysis while encoding. -
The sample size for constructing the imputation models in
imputation_sklearn
or creating latent features throughautoencoder_latentFeatures
should be selected with caution, taking into the consideration the dataset size and the number of columns. This sample dataset is converted into a Pandas DataFrame and subsequent operations are run on a single node (driver). If the sample dataset is too large to fit into the driver's memory, this will result in a memory overflow error.
🔩 Other
-
The
stability index
can currently only be calculated for numerical columns. -
Due to incompatibilities in TensorFlow and Docker, Anovos may not run well on Apple's M1 chips. You can find out more here:
- Docker documentation on running on Apple hardware
- Pythonspeed article on Docker build problems on Macs
-
The exception and error handling within Anovos is at times inconsistent. Please don't hesitate to file an issue on GitHub if you encounter any problems.