Skip to content

truthtracer/super-node

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Copyright (c) 2019-2050 TruthTracer. All rights reserved.

Author
TruthTracer

License
super-node is under the LGPL V3 license.

See LICENSE for the full license text.

About
A supervisor and data node of various workers for edge computing (AI inference, including LLM backend) / IoT etc.
monitor all task workers and keep them always working (health check / restart / send params)
collect all data from workers into queues cache, consuming by other biz service

work flow
    basically, the pattern is
    1. node receive a task from '/cmd' and start a worker with params
    2. worker send data back to the node (master) via websocket('/put'), node will cache data with limited queue size
    3. other app will fetch data from node via websocket('/fetch')
    4. for every task worker, node use a configurable fixed length ordered queue to cache

workers can be:
  IoT sensor collectors
  audio / video pullers
  data from other network services (rpc/http/websocket/mqtt/redis....)
  other calc jobs / data gathering

api design
  '/m/cmd' : in json, send command to node
  '/m/fetch/[task_id]' : biz app get data from a special task, can fetch all data one after another when not passed task_id
  '/m/stat' : get task status
  '/w/ch' : a worker channel put data to node, get cmd from master ...
  multiple clients talk to one api url are allowed

worker management
  monitor all worker process and restart a worker with no response, frozen or abnormal...
  register as a worker, 2 methods
    exec a child process
    connect as a net client
  restart condition
    process status : no configurable, dead/gone will kill and restart
    process memory usage : configurable, in kb, default 0 : no limit
    process execution timeout : configurable, in ms, default 0 : no limit
    net client heartbeat 
  data structure
    task_id : string, unique in whole workers of one node
    start_params : string
    pid : int, use pid to notify a process, SIGTERM SIGKILL ...
    first_sig : if trigger restart of worker, send a sig to process
    second_sig_duration : int, in ms, wait duration after first sig, will use SIGKILL to terminate process
    heartbeat_interval : if worker use net client to talk, will do heart beat check
    accept_tasks : yes/no, default no, need supported by the worker
    exec_interval : int, in ms, repeatedly exec a command
  PS: node was gone, worker reconnect to continue work, is not supported, cause node can not persistent some states (restarting task count, stopped a task ...)

data gathering
  data received from workers and cache to queue (length=N) with entity
  REST/websocket method to fetch from queue

About

A supervisor and data node of various workers for edge computing (AI inference, including LLM backend) / IoT etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages