Currently, imshow
plots are showing in debug samples section.
SmugTurtle78
by the end of this week
Hi GrievingTurkey78
If you like to have the same environment in trains-agent
, you can use on your local machine the detect_with_pip_freeze
option, on you ~/trains.conf
file.
Just change detect_with_pip_freeze: true
( https://github.com/allegroai/trains/blob/master/docs/trains.conf#L168 is an example)
Hi GloriousPanda26 , great, I'll check that, didn't understand if the original usage got you the configuration or not (got that with to_container you can connect_configuration )
Hiย ItchyHippopotamus18 ย , can you try withtorch.save(model_jit, os.path.join(checkpoint_path, f'{epoch_num}_{round(acc_full, 4)}.pt'))
?
Can you see it in the model? Click on the model link to get into the model
Hi GleamingGrasshopper63 ,
ClearML will take you configuration from ~/clearml.conf
file (so it should use the file if you are root user). You can also configure env var for you auth and api work:
export CLEARML_API_HOST={api_server} export CLEARML_WEB_HOST={web_server} export CLEARML_FILES_HOST={files_server} export CLEARML_API_ACCESS_KEY={access_key} export CLEARML_API_SECRET_KEY={secret_key}
Can you verify the configuration file location or try with the env vars?
WackyRabbit7 you can also use task.execute_remotely()
once the task is configure, like in https://github.com/allegroai/clearml/blob/master/examples/pipeline/step1_dataset_artifact.py#L6 example
Try to clone the task (right click on the task and choose โcloneโ) and you will get a new task in draft mode, that you can configure ( https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps#clone-an-experiment )
Hi ThickDove42 , you are right, I can verify I also got the same (clone, edit script, enqueue and when the worker start to run the SETUP SHELL SCRIPT
got empty), feels like a bug. will update you once it solved.
Is this shell script you want to run common for all your tasks or just for specific one?
It should create task B with the same commit as task A in this scenario, do you have different commits?
Same credentials configuration for the ClearML-Agent.
Notice that when a task is created, in the UI, under EXECUTION
tab, you can find (and change if you like) the output destination.
Hi PunyBee36 , what about the pulling of the task? works?
About the running task, I can read in the logs that a new instance was created (i-02fc8...), can you check if you have a running clearml agent on it? if so, the agent will pull the task from the queue, if not, can you check in this instance logs for errors and share?
Basically we can have Pigar or freeze for getting the packages&versions (+ change and create a template in the UI), what is the specific scenario you have? maybe we can think about another solution
Hi DepressedChimpanzee34 ,
Hydra should be auto patched, did you try this example?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
yep, you should get a dict like:
{'title': {'series': { 'x': [0, 1 ,2], 'y': [10, 11 ,12], }}}
Well the happy flow is to execute is locally, then clone this task and run it in the agent. If you dont want to run it locally, you can use the task.execute_remotely()
and clone this one or just start running it and kill it manually, or run one epoch to view the outputs
Hi FloppyDeer99 ,
It depends on you setup:
if you have on prem machines, you can start more than one clearml-agent on the machine with the resources and assign for example each gpu on the machine to a https://clear.ml/docs/latest/docs/clearml_agent#docker-mode . You can have the same for cloud machine, and if you are using the AWS you can run the https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler/ as a service. K8S: there is a great example for k8s glue https://github.com/...
I hope you understand what I mean
yes ๐
let me check that
๐ great, so if you have an image with clearml agent, it should solve it ๐
thanks SmugTurtle78 , checking it
Hi UnevenDolphin73 ,
which agent version are you using? Do you setup the env variable in the agentโs machine too?
- Can you set env var
CLEARML_DOCKER_SKIP_GPUS_FLAG
to true?
Regarding this - https://clearml.slack.com/archives/CTK20V944/p1657525402861009?thread_ts=1657291641.224139&cid=CTK20V944 - can you add some more info? maybe the log?
Do you report with LightningModule
log
function? something like:self.log('train_loss', loss)
?