
Reputation
Badges 1
25 × Eureka!Did you experiment any drop of performances using forkserver?
No, seems to be working properly for me.
If yes, did you test the variant suggested in the pytorch issue? If yes, did it solve the speed issue?
I haven't tested it, that said it seems like a generic optimization of the DataLoader
You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
Hi UnevenDolphin73
Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?
You can add a link as an artifact, that is probably the easiest:tasl.upload_artifact(name="just link", artifact_object="
")
EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with
.datasets
. This creates several annoyances that I...
Hi CharmingBeetle38
On the base task, do you see those arguments under the Configuration tab?
Also, if they are under Args section, you should add "Args/" prefix to the HP optimization (this is how you differentiate between the sections)
CurvedHedgehog15 there is not need for :task.connect_configuration( configuration=normalize_and_flat_config(hparams), name="Hyperparameters", )
Hydra is automatically logged for you, no?!
@<1546303254386708480:profile|DisgustedBear75> is think this was a UI bug, they are just releasing a new version that fixes that (i.e. server version), are you running a self-hosted server?
Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with
pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl
Then run the daemon with the additional --debug
argument, basically:
clearml-agent --debug daemon --foreground ...
Once the agent is running please send the Task's log from your console 🙂
ElegantCoyote26 can you browse to http://localhost:8080 on the machine that was running the trains-init ?
seems like the network inside the running code cannot access the localhost (even though you have --network=host
. Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
Hi LovelyHamster1
As you noted, passing overrides in Args/overrides
, for example ['training.max_epochs=1000']
should work when running with the agent.
Could you verify with the latest RC, there was a fix to support the latest hydra versionpip install clearml==0.17.5rc5
Will using Model.remove, completely delete from storage as well? (edited)
correct see argument delete_weights_file=True
You mean the entire organization already has Kubeflow, or to better organize something (if this is the second, what are we organizing, pipelines?)
Ohh so you are saying you can store it properly, but only editing in the UI is limited ? (Maybe this is just a UI thing)
We should probably change it so it is more human readable 🙂
I want to be able to access the data just avoid reporting the experiment results
Yes, you are correct 😞
If you just want to skip the logging you can always add an if to the Tasl.init call ?!
I’ll check if I could wrap the code in something that calls the Task.delete if debugging
Whatever you think works best for you, I was genuinely curious 🙂
To me (personally) it is helpful to have a log even while debugging (comparing to previous runs etc, trying to see what went wrong even on a console output level). When I'm done I just search for everything I worked on select all, and archive them. Then a cleanup service in the background clears all the archived Tasks once they ar...
BTW: Basically just call Task.init(...)
the rest is magic 🙂
So obviously that is the problem
Correct.
ShaggyHare67 how come the "installed packages" are now empty ?
They should be automatically filled when executing locally?!
Any chance someone mistakenly deleted them?
Regrading the python environment, trains-agent
is creating a new clean venv for every experiment, if you need you can set in your trains.conf
:agent.package_manager.system_site_packages: true
https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c67...
Hi @<1727497172041076736:profile|TightSheep99>
Yes it can, it will upload the meta-data as well as the files (it will also do de-dup and will not upload files that already exist in the dataset based on the hash of teh file content)
For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944
AbruptWorm50 can you send full image (X axis is missing from the graph)
DefeatedOstrich93 many thanks I was able to reproduce it (basically newly added files caused git apply to fail)
Fix will be part of the next clearml-agent RC
Yes this seems like it is stuck, could you test with the demo server ?
(basically remove the clearml.conf it will connect automatically)
1e876021bbef49a291d66ac9a2270705
just make sure you reset it 🙂
Hi JitteryCoyote63
I change the project.default_output_destination? I tried setting it to None but it is not updated
How did yo try to change it? and where do you see the effect ?
I can definitely see your point from the "DevOps" perspective, but from the user perspective it put the "liability" on me to "optimize" the resource, which to me sounds a bit much to put on my tiny shoulders, I just have a general knowledge on what I need. For example lots of CPUs (because I know my process scales well with more cpus), or large memory (because I have an entire dataset in memory). Personally (and really only my personal perspective), I'd rather have the option to select from a...
and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
Oh ...
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...