Maybe you are one of the lucky ones, you got a nice complete production data set on your laptop. One small problem though is that production data is typically hard to obtain, even partially, and it is not getting easier with new European laws about privacy and security. In both cases, a tempting option is just to use real data. Why do data scientists and data engineers work with synthetic data and how do they obtain it?Īs a data engineer, after you have written your new awesome data processing application, you think it is time to start testing end-to-end and you therefore need some input data.Īs a data scientist, you can benefit from data generation since it allows you to experiment with various ways of exploring datasets, algorithms, data visualization techniques or to validate assumptions about the behaviour of some method against many different dataset of your choosing. Generating random dataset is relevant both for data engineers and data scientists.
FAKE ID GENERATOR WIND HOW TO
It will also walk you through some first examples on how to use Trumania, a data generation Python library.įor more information, you can visit Trumania's GitHub! Why generate random datasets ? This tutorial provides a small taste on why you might want to generate random datasets and what to expect from them.
Both authors of this post are on the Real Impact Analytics team, an innovative Belgian big data startup that captures the value in telecom data by "appifying big data". Editor's note: this post was written in collaboration with Milan van der Meer.