GrievingTurkey78 , can it be a heavy calculation that takes time? ClearML has a fallback to time instead of iterations if a certain timeout has passed. You can configure it with task.set_resource_monitor_iteration_timeout(seconds_from_start=<TIME_IN_SECONDS>)
The "template" task
JitteryCoyote63 , if you go to a completed experiment you only see the packages stage installed in the log?
What OS/ClearML-Agent are you running?
Interesting! Do they happen to have the same machine name in UI?
This should be a good started, after googling more on how ssh works will give you the right direction 🙂
Did you set your password or an access token? Also, please try sticking to a single thread per topic and not multiple messages, it spams the channel
Sure, if you can post it here or send in private if you prefer it would be great
Does this happen on different python versions?
JitteryCoyote63 , doesn't seem to happen to me. I'll try raising a clean server and see if this happens then. You're running with 1.2, correct?
o, if I pull this file from s3 bucket, I can conclude which chunk I should download to get a specific file. Am I wrong?
I think you're right. Although I'm not sure if you can decompress individual chunks - worth giving it a try!
I also though clearML writes this mapping (
state.json
) into one of its databases: Mongo, Redis, Elasticsearch.
I think the state.json is saved like an artifact so the contents aren't really exposed into one of the dbs
Yes you can, see the examples here 🙂
None
GrievingTurkey78 , do you have iterations stated explicitly somewhere in the script?
Hi @<1691983266761936896:profile|AstonishingOx62> , its not only the output destination field but where all artifacts/debug/datasets were saved. This is usually logged in Mongo so you would need to run some migration script to change all the urls to the new ip
Hi @<1695969549783928832:profile|ObedientTurkey46> see below 🙂
Hi, I have two instances of clearml-server that I would like to merge into one. Is there any way to do that without loosing tracked experiments, datasets or artefacts?
This is not trivial, you would have to merge the databases somehow.
Additionally, do you if it is possible to backup clearml-server
without
shutting it down? If shutdown is mandatory, I have to force the Datascientists to stop traini...
Hi @<1523701122311655424:profile|VexedElephant56> , do you get the same response when you try to run a script with Task.init() without agent on that machine?
You'll need ES/Mongo to run the ClearML server
Are you sure you migrated all the data correctly?
Hi @<1734020162731905024:profile|RattyBluewhale45> , what version of pytorch are you specifying?
Hi @<1831502554446434304:profile|TestyKitten53> , what if you set it to true? Do you get the same errors?
GrievingTurkey78 , I'm not sure. Let me check.
Do you have cpu/gpu tracking through both pytorch lightning AND ClearML reported in your task?
Do you mean reporting scalars with tensorflow OR having the reported tensorflow scalars show up on ClearML?
unrelated to the agent itself
Hi MammothParrot39 , what command do you run the agent with?
Interesting. I'll try to reproduce and see if it occurs to me as well 🙂
I'd suggest using the agent in --docker mode
Hi @<1578555761724755968:profile|GrievingKoala83> , what happens if you rerun this via the webUI ?
Hi @<1534344462161940480:profile|QuaintSeal61> , have you upgraded to a new version? Are you self hosted or using the community server? Also, can you elaborate on which part of it is slow? 🙂
Hi @<1715175986749771776:profile|FuzzySeaanemone21> , can you provide a log of the run? Also some code snippet that reproduces this behavior on your side?