Reputation
Badges 1
25 × Eureka!Hi @<1636175432829112320:profile|PlainSealion45>
- I used this initial model to create the endpoint with
model add
command.
I think that the initial model needs to be added with model auto-aupdate
Not with model add
basically do not call model add - this is static, always using the model ID specified (you can deploy new models with manually callign model add on the same endpoint and specifying diffrent model ID , but again manual)
To Automatically have the m...
Hi JuicyFox94 ,
Actually we just added that 🙂 (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
Hi @<1524922424720625664:profile|TartLeopard58>
Yes this is the default it is designed to serve multiple models and scale horizontally
WickedGoat98 sure that will not be complicated:
try something along the lines of :agent: networks: - backend container_name: clearml-agent image: allegroai/clearml-agent:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
`
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-}
...
task.update({'script': {'version_num': 'my_new_commit_id'}})
This will update to a specific commit id, you can pass empty string '' to make the agent pull the latest from the branch
My clearml-server server crashed for some reason
😞 No worries
So it seems decorator is simply the superior option?
Kind of yes 😊
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
For example, opening a project or experiment page might take half a minute.
This implies mongodb performance issue
What's the size of the mongo DB?
Hi AverageBee39
What's the clearml-server and clearml packge you are using ?
(I looks like some capability that is missing from the server, i.e. needs upgrade ?!)
Oh, did you try task.connect_configuration
?
https://allegro.ai/docs/examples/reporting/model_config/#using-a-configuration-file
or by trains
We just upload the image as is ... I think this is SummaryWriter issue
Actually that is less interesting, as it is quite straight forward
Also, can the image not be pulled from dockerhub but used from the local build instead?
If you have your docker configured to pull from local artifactory, then the agent will do the same 🙂 (it is calling the docker command just like you do)
agent.default_docker.arguments: "--mount type=bind,source=$DATA_DIR,target=/data"
Notice that you are use default docker arguments in the example
If you want the mount to always be there use extra_docker_arguments :
https://github.com/...
This seems to be more complicated than what it looks like (ui/backend combination), not are not working on it, just that it might take some time as it passes control to the backend (which by design does not touch external storage points).
Maybe we should create an S3 cleanup service, listing buckets and removing if the Task ID does not exist any longer. wdyt?
Hey, is it possible for me to upload a pdf as an artefact?
Sure, just point to the file and it will upload it for you 🙂
PleasantGiraffe85
it took the repo from the cache. When I delete the cache, it can't get the repo any longer.
what error are you getting ? (are we talking about the internal repo)
Hi SmugSnake6
I think it was just fixed, let me check if the latest RC includes the fix
RoundMosquito25 good news, no no need to open any ports 🙂
Basically B_i agents are always polling the server for "jobs" create an http/s request from them to the server, so all connections are out connections. Firewall is intact 🙂
StaleButterfly40 just making sure I understand, are we trying to solve the "import offline zip file/folder" issue, where we create multiple Tasks (i.e. Task per import)? Or are you suggesting the Actual task (the one running in offline mode) needs support for continue-previous execution ?
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
RC should be out later today (I hope), this will already be there, I'll ping here when it is out
Hi JitteryCoyote63 you can bus obviously you should be careful they might both try to allocate more GPU memory than they the HW actually has.TRAINS_WORKER_NAME=machine_gpu0A trains-agent daemon --gpus 0 --queue default --detached TRAINS_WORKER_NAME=machine_gpu0B trains-agent daemon --gpus 0 --queue default --detached
Hmm are you getting the warning on the client side , or in the clearml-server ?
Task.add_requirements('.')
Should work
Hi RotundHedgehog76
Notice that the "queued" is on the state of the Task, as well as the the tag
We tried to enqueue the stopped task at the particular queue and we added the particular tagWhat do you mean by specific queue ? this will trigger on any Queued Task with the 'particular-tag' ?
This really makes little sense to me...
Can you send the full clearml-session --verbose console output ?
Something is not working as it should obviously, console output will be a good starting point
does this mean that Task stores --args (and propagates these further through the code as CLI arguments) somewhere where i can get and manipulate them from my code?
Yes it changes the actual argparse object and pushes the new values in runtime, basically you args.parse() will return the values from the UI (backend)
And is Task.init called on all processes ?
JitteryCoyote63 Not sure how/why the X-Pack feature was on (it is not used by the system), but you can disable it with an environment variable in the docker-composexpack.security.enabled=false
Should solve the problem ...