Which storage are you using? ClearML files server?
Can you verify the paths you are using in your script?
Trains do patch the torch save function 🙂
If you like, you can save the model for each epoch by having a unique name for it. The model will be saved in the output_uri
path you have in the Task.init
command.
For example, this code will save a model for every epoch:
for epoch in range(num_of_epoch): # Create a model torch.save(model, "model_epoch_{}".format(epoch))
Hey SubstantialElk6 ,
You can try adding environment vars with that info:
os.environ["CLEARML_API_HOST"] = api_server os.environ["CLEARML_WEB_HOST"] = web_server os.environ["CLEARML_FILES_HOST"] = files_server os.environ["CLEARML_API_ACCESS_KEY"] = access_key os.environ["CLEARML_API_SECRET_KEY"] = secret_key
Hi UnsightlyShark53 ,
Trying to understand the scenario, so you want the model to be saved in trains_storage
dir but trains
saves it in trains_storage/trains_storage
? Or the torch.save
doesn't save in the path?
Hi LethalCentipede31
You can report plotly with task.get_logger().report_plotly
, like in https://github.com/allegroai/clearml/blob/master/examples/reporting/plotly_reporting.py
For seaborn, once you use plt.show
it will be in the UI (example https://github.com/allegroai/clearml/blob/master/examples/frameworks/matplotlib/matplotlib_example.py#L48 )
Hi PanickyMoth78 ,
Can you try with pip install clearml==1.8.1rc0
? it should include a fix for this issue
Hi UnevenDolphin73 ,
which agent version are you using? Do you setup the env variable in the agent’s machine too?
- Can you set env var
CLEARML_DOCKER_SKIP_GPUS_FLAG
to true?
Regarding this - https://clearml.slack.com/archives/CTK20V944/p1657525402861009?thread_ts=1657291641.224139&cid=CTK20V944 - can you add some more info? maybe the log?
So according to it, you are using the repo requirements, and you have torch there?
the controller task? same as here - https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_controller.py
Hi JitteryCoyote63 ,
thanks for reporting it, I was able to reproduce the issue, will update here once a fix is out 🙂
logger.report_matplotlib_figure(title="some title", series="some series", figure=fig, iteration=1, report_interactive=Fasle)
Hi CooperativeSealion8 ,
trains
is configured according to ~/trains.conf
file, in this file you should define the app, api and files servers.
You can do it with our great wizard, just typetrains-init
in your CLI and follow the instructions,
` ❯ trains-init
TRAINS SDK setup process
Please create new trains credentials through the profile page in your trains web app (e.g. )
In the profile page, press "Create new credentials", then press "Copy to clipboard".
Paste cop...
We can certainly add a trains.conf
brief, thanks for the feedback 🙂
 thanks for the answer, so for example (to make sure I understand) with the example you gave above when I’ll print the config I’ll see the new edited parameters?
Correct
What about the second part of the question, would it be parsed according to the type hinting?
It should
DepressedChimpanzee34 how do you generate the task thats running remotely? once the agent pulled the task, this is your running configuration (it will pull the same configuration from the server as you see in the UI)
python invoked oom-killer
Out of memory, CloudySwallow27 in the scaler app task, can you check if you have scalers reporting?
Hi RoundMosquito25 ,
Are you running your project as part of a git repository? If so, you can just add Task.init()
call from the main script you are running (e.g. your train.py/main_file.py file) and all should be logged automatically.
for reporting https://clear.ml/docs/latest/docs/guides/reporting/artifacts , you can use your task object doing so:
` task = Task.init(project_name="My project", task_name="My task")
...
task.upload_artifact(
'my artifact name',
artifact_obj...
If you want to clear the parameters, you can try overriding with an empty dict
cloned_task.set_parameters({})
Why not using it directly from S3?
You can https://allegro.ai/clearml/docs/docs/examples/examples_storagehelper.html#downloading-a-file it with the storageManager
I guess not many people use the local file storage
I’m using it 🙂
How can I reproduce this issue? what should I have as cache_dir
? ~/.clearml
?
It should be fixed in one of the next versions
For general duration, the Started and Updated column diff should be the total experiment duration
Hi @<1523707056526200832:profile|ScaryKoala63> .
try using task.upload_artifact
for manually uploading artifacts, like in here , you can also configure the upload destination
Hi AverageRabbit65 ,
Is this part of a repository? if so, you can specify the repo in the add_function_step
Hi DefeatedCrab47 , did you reload with cntl + F5? is not, can you try and update me?