
Reputation
Badges 1
25 × Eureka!but we run everything in docker containers. Will it still help?
As long as you are running with clearml-agent(in docker mode), all the cache folders (this one included) are mounted on the host machine for persistency
Does this mean the model weights are stored on the clearml-server file system?
By default they are just logged (i.e. the local path is stored, but the file is not uploaded). If you want to automatically store the model, pass output_uri=True
to the Task.init , or any object store / shared folder (e.g. output_uri='
s3://bucket/folder '
). ClearML will automatically create a subfolder for the Task, and upload all models/artifacts to it.
` task = Task.init(project_name='ex...
BTW: 0.14.3 solved the issue you are referring to, so you can import trains before / parsing the args without an issue. Regrading passing project/name as parameters. A few thoughts: (1) you can always rename / move projects from the UI (2) If you are running it with trains-agent
there is no meaning to these arguments, as by definition the Task was already created... Maybe we should give an option to exclude a few arguments from argparser, I think this topic came up a few times... What d...
AstonishingSeaturtle47 yes it does. But I have to ask how come you have sub modules that one will have credentials for the master repo and not the sub ones? Also it sounds like a good solution would be for the trains-agent to try and pull the sub-modules and if it cannot, it should just print a warning and continue. What do you think?
When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.
But it worked well when using StorageManager and uploading to the minio directly, is that correct?
.. I give my Minio to output_uri argument
How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)
I would like to use ClearML together with Hydra multirun sweeps, but Iβm having some difficulties with the configuration of tasks.
Hi SoreHorse95
In theory that should work out of the box, why do you need to manually create a Task (as opposed to just have Task.init call inside the code) ?
Okay I think I found the confusion here (and it is confusing, but also very cool)
This line:metrics_names = {"metrics": ["name", "bias", "r2"]} task.connect(metrics_names)
When running in "manual mode" (i.e. not by an agent), will take the dict metrics_names
and put it on the Tasks HyperParameters section.
But, when executed by the Agent, it will do the opposite! it will take the data stored on the Task's hyperparameters section and put it back into the metrics_names ` variable...
So you have two options
- Build the container from your docker file and push it to your container registry. Notice that if you built it on the machine with the agent, that machine can use it as Tasks base cintainer
- Use the From container as the Tasks base container and have the rest as docker startup bash script. Wdyt?
I think for it to work you have to have ssh running on the host machine (the socket client itself), no?
but then an error message in the web-app pops up
Fetch parents failed
and the Scheduler task disappears
And the Task is still running? What's he clearml python version and webui version ?
to fix it, I excluded this var entirely from the docker-compose
Make sense.
the path to the JSON file
Yep, that's what I did and things seem to work... Let me check again if I missed anything
Sure, in that case, wait until tomorrow, when the github repo is fully synced
TightElk12 are you still looking for a way to create a new "sub-task" ?
DefeatedOstrich93 what do you mean by "I am wondering why do I need to create files before applying diff ?"git diff
will not list files unless their are added (they are marked as "untracked") think temp files logs etc. until you add a file to git it will basically ignore that file. Make sense ?
We already have the feature-store to save all data, thatβs why I donβt need to save it (just a reference of version of dataset).
that makes sense, so why don't you point to the feature store ?
I can have different steps of the pipeline running on different machines. But this is not my use case.
if they are running on the same machine you can basically return a path to the local storage or change the output_uri to the local storage, this will cause them to get serialized to the l...
YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160
No it will not π the closer is closer to the actual print.
That said, I'm sure it would not be complicated to add.
But I have to wonder, this will really create a mess in the console log, so if someone wants it, it will be global (i.e. also in the visible console. not only in the backend), so the case where the console on the machine itself is "clean" but the backend log is full of debug stuff is not clear to me
Hi WickedElephant66
So I'm trying to upload an artefact to clearmlβs fileserver(I have a self hosted clearml server running),
Are you trying to upload an artifact? If so I would do:task.upload_artifact('local file', artifact_object="/path/to/file")
Or is it about Model files?
You can alst check how to upload artifacts / models here:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
https://github.com/allegroai/clearml/blob/master/examples/reporti...
Hi @<1523702932069945344:profile|CheerfulGorilla72>
This is a property on the Model object
model.published
Not sure why we do not have it here...
None
(I'll ask them to fix that)
logger.report_scalar("loss", "train", iteration=0, value=100)
logger.report_scalar("loss", "test", iteration=0, value=200)
so the thing with IAM roles, they are designed to allow AWS instances to get "automatic" permission (based on the IAM role). They are not actually designed to generate key/secret as I think the lifetime is be default relatively short. Since the actual request to the S3 comes from the client browser (i.e. outside of AWS cluster) the IAM role cannot apply, and you have to provide the key/secret. The easiest way is to generate S3 keys regardless of the IAM roles, to be used with the clients (sp...
trains[azure] give you the possibility to do the following:from trains import StorageManager my_local_cached_file = StorageManager.get_local_copy('azure://bucket/folder/file.bin')
This means you do not have to manually download stuff/ and maintain the cache local cache, the StorageManager will do that for you.
If you do no need that ability, no need to install the trains[azure]
you can just install trains
Unfortunately, we haven't had the time to upgrade to the Azure storage v...
Hi EnviousStarfish54
Artifacts are stored per experiment, that means that storage wise every experiment uploading an artifact (even if it is the same file content as previous execution) will create a new file on the central storage (default being the trains-server)
As for the preferred way to share data / artifacts. Where do you have your trains server ? Is it local ? Cloud? Where do you access it from home? VPN?
The agents are docker containers, how do I modify the startup script so it creates a queue?
Hmm actually not sure about that, might not be part of the helm chart.
So maybe the easiest is:from clearml.backend_api.session.client import APIClient c = APIClient() c.queues.create(name="new_queue")
seems like the network inside the running code cannot access the localhost (even though you have --network=host
. Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
DilapidatedDucks58 I see ...
This might be more complicated that one would imagine, a simple solution might be to store a snapshot of the values every-time we reach a new maximum, a quick hack might be to add it as text on one of the task's parameters or properties (that we can later add to the table as custom column).
wdyt?
You can already sort and filter experiments based on any hyper parameter or metric that the experiment reports, there is no need for any custom language query. Also all created filter/sorted table can be shared exactly as they are, so you can create leaderboards and share specific filters. You can also use the search bar in order to filter based on experiment name / comment. Tags will be added soon as well π
Example of custom columns is here (the screen grab is a bit old, now there is als...