Document

大数据代写 | FIT5202 – Data processing for Big Data Assignment 2

阿叶2024-05-15 18:01:33

留学生作业代写do not hesitate to contact me!
WeChat：lovexc60

FIT5202 – Data processing for Big Data
Assignment 2: Detecting Linux system hacking activities Part B

1. Producing the data (30%)
In this task, we will implement two Apache Kafka producers (one for process and one for
memory) to simulate the real-time streaming of the data.
Important:
– In this task, you need to generate the event timestamp in UTC timezone for each
data record in the producer, and then convert the timestamp to unix-timestamp
format (keeping UTC timezone) to simulate the “ts” column. For example, if the
current time is 2020-10-10 10:10:10 UTC, it should be converted to the value of
1602324610, and stored in the “ts” column
– Do not use Spark in this task
1.1Process Event Producer (15%)
Write a python program that loads all the data from “Streaming_Linux_process.csv”. Save
the file as Assignment-2B-Task1_process_producer.ipynb.
Your program should send X number of records from each machine following the
sequence to the Kafka stream every 5 seconds.
– The number X should be a random number between 10~50 (inclusive), which is
regenerated for each machine in each cycle.
– You will need to append event time in unix-timestamp format (as mentioned above).
– If the data is exhausted, restart from the first sequence again
1.2 Memory Event Producer (15%)
Write a python program that loads all the data from “Streaming_Linux_memory.csv”. Save
the file as Assignment-2B-Task1_memory_producer.ipynb.
Your program should send X number of records from each machine following the
sequence to the Kafka stream every 10 seconds. Meanwhile, also generate Y number of
records with the same timestamp. These Y number of records would be sent after 10
seconds (or the next cycle)