Generative Adversarial Networks are widely used as a tool to generate synthetic data and have previously been applied directly to time-series data. However, relying solely on the binary adversarial loss is not sufficient to ensure the model learns the temporal dynamics of the data. TimeGAN introduces an additional reconstruction and supervised loss to tackle this issue and effectively captures the step-wise dependencies of the data. We propose novel improvements to TimeGAN by re-weighting the training iterations over the three training phases leading to a reduction in the overall training time by up to 29% and produce results equal or better compared to the benchmark.
Generative Adversarial Networks consist of two main components; a generator and a discriminator. Both are implemented by a neural network. The generator takes random noise as input and outputs synthetic data. This synthetic data is mixed with real training data and used as input for the discriminator. The job of the discriminator is then to tell if the data presented to it is either real or fake. In this manner, the discriminator and the generator act as adversaries as the generator tries to fool the discriminator and the discriminator tries to learn whether the provided data is real or fake. This creates a min-max game where two parties try to respectively minimize and maximize a loss function. By using this adversarial loss to train both components the generator can create high fidelity synthetic data.
TimeGAN combines the concepts of generative adversarial networks, auto-regressive models for sequence prediction and time-series representation learning to create a model that efficiently and effectively can create high fidelity synthetic data.
Besides consisting of a sequence discriminator and a sequence generator, TimeGAN makes use of two other main components: an embedding and recovery function. The first maps from feature into latent space, which allows the adversarial components to train in a lower-dimensional space and learn the step-wise dependencies in the data. The recovery function allows converting back into the feature space. Note that the discriminator and generator train (and thus take input and output) using the latent vector representations. These mapping functions (embedding and recovery) are respectively implemented by a recurrent neural network and a feed-forward network. Another addition to the normal GAN framework is the use of a supervised loss for the generator. The generator receives actual data in the latent space and has to generate the next step (also in the latent space). This ensures that the generator learns the step-wise dependencies and can generate synthetic data with similar step-wise transitions. A block diagram of the training of TimeGAN can be seen in figure 1 below.
The TimeGAN algorithm learns over three different phases of training as shown in figure 2 below. First, only the embedding and recovery networks are trained in the Embedding Phase. In the second phase, the Supervised Phase, the generator is trained with the supervised loss only. In the last phase, all the components are trained in a joint fashion. In this Joint Phase, the algorithm jointly learns to encode, iterate and generate time-series data. Since this last phase combines the first two phases and adds the additional adversarial loss training it is the most time-consuming phase. All phases are trained over the same amount of iterations, as all phases have equal weighting in the original TimeGAN implementation.
Compared to the alternative approaches utilizing the GAN framework, TimeGAN naturally has computational overhead by adding components to the architecture. By optimizing additional losses for these components (embedding loss, recovery loss, supervised loss) it logically follows that TimeGAN suffers from greater training times.
The discriminative score is used to indicate the similarity between the synthetic and original data. To compute this metric, we first label each sequence in the original data set real and label each sequence in the synthetic data set as not real. We then train a classification model to distinguish between the real and synthetic data as a standard supervised machine learning task. The classification error is then calculated on a test set, which gives a measurement of similarity between the synthetic and original data.
The predictive score is more unique to time-series GANs as it shows how well the model captured the temporal dynamics/step-wise dependencies of the data. To calculate this metric we first train a sequence-prediction model on the generated data set. After training this model we evaluate its performance on the original data set. We obtain the predictive score by calculating the mean absolute error.
Since we are interested in reducing the time for the improvement we also consider the time it takes to train the TimeGAN algorithm over the different phases as a metric. Besides the total time the training takes, for each phase, we calculate the time metric by the timestamp of the end of that phase minus the timestamp of the start of that phase.
TimeGAN adds several components and losses to the standard GAN framework. These additional losses yield three different training phases which use the same amount of epochs to train as they have equal weights (e.g. when provided 50000 iterations as a hyperparameter, all phases will train for 50000 iterations). However, it is worth noting that the last phase is significantly more time-consuming than the first two phases, This is shown in figure 3 below.
We propose that the iterations each phase uses are weighted differently with respect to the others. Since the first two phases are relatively fast, we researched the impact in results of decreasing the iterations used in the last (most time-consuming) phase. Our hypothesis is that by distributing the reduced iterations of the last phase over the first two phases of training, we can compensate and achieve equal or better results. Therefore, we set up 9 different configurations which all use 150000 iterations in total, but are distributed in different ways over the three training phases. All combinations have in common that they reduce the training iterations for the Joint Phase. These configurations are shown in table 3.
We implemented the distribution over phases by adding parameters to the TimeGAN algorithm which are used to specify the iterations for each training phase. To evaluate the performance of our proposed improvement we made use of the familiar discriminative and predictive metrics. In addition, we measure the total training time of TimeGAN to show the overall decrease in training time.
The results obtained for the different configurations can be found plotted in figure 4 below for the stock data set.
Configurations C6 up to C8 provide quite consistent results equal or better compared to the benchmark configuration. An average speedup for the training of TimeGAN ranging from 9\% up to 29\% can be achieved using these configurations.
The predictive scores do not improve or decrease a lot for any of the configurations. This is mostly due to the fact that we train the Supervised Phase more in all configurations, In the Supervised Phase, the model learns to produce similar step-wise transitions, which is exactly what is evaluated in the predictive score and can therefore be compensated for the reduced training in the Joint Phase.
To conclude, we found that TimeGAN effectively learns the temporal dynamics of time-series data and is able to generate realistic-looking synthetic data. We proposed novel improvements to the existing algorithm by re-weighting iterations across the three different phases of training of TimeGAN (Embedding Phase, Supervised Phase, Joint Phase). We found that the Joint Phase is the most time-consuming and by decreasing the iterations used for this phase and increasing the iterations over the other phases, the overall performance of TimeGAN can be improved. Our results suggest that for a maximal speedup in training time (29\%) the iterations should be weighted in a 2:2:1 ratio for the Embedding, Supervised and Joint Phase, respectively.