Reputation
Badges 1
25 × Eureka!Is this a bug, or an issue with clearml not working correctly with hydra?
It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to...
SourLion48 you mean the wraparound ?
https://github.com/allegroai/clearml/blob/168074acd97589df58436a3ec122a95a077620c2/docs/clearml.conf#L33
I'm wondering what happens if i were to host the instance and one of these were to go down from time to time in production, as the deployments provided by the helm chart are not redundant.
Long story short, it will break the clearml-server, please do not take them down, if you do need to do that, also take down the clearml-server. The python clients will wait until it is up again, so no session would be destroyed
It's always the details... Is the new Task running inside a new subprocess ?
basically there is a difference between
remote task spawning new tasks (as subprocesses, or as jobs on remote machine), remote task still running remote task, is being replaced by a spawned task (same process?!)UnevenDolphin73 am I missing a 3rd option? which of these is your case?
p,s. I have a suspicion that there might be a misuse of "Task" here?! What are you considering a Task? (from clearml perspective a Task...
Hi AttractiveShrimp45
Well, I would use the Task.connect
to add a section with any configuration your are using. for exampleTask.current_task().connect(my_dict_with_conf_for_data, name="dataset51")
wdyt?
you are correct, I was referring to the template experiment
Correct 🙂
btw: my_dict_with_conf_for_data
can be any object, not just dict. It will list all the properties of the object (as long as they do not start with _)
Please send the full log, I just tested it here, and it seems to be working
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
BTW from the log you attached:
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/storage/helper.py", line 218, in StorageHelper
_gs_configurations = GSBucketConfigurations.from_config(config.get('google.storage', {}))
This means it tries to remove an artifact from a Task, that artifact is probably in GS (i'm assuming because it is using the GS api), and the cleanup service is missing the GS configuraiton.
WackyRabbit7 is that possible ?
what do you say that I will manually kill the services agent and launch one myself?
Makes sense 🙂
Can't figure out what made it get to this point
I "think" this has something to do with loading the configuration and setting up the "StorageManager".
(in other words setting the google.storage)... Or maybe it is the lack of google storage package?!
Let me check
Very odd, I still can't reproduce. This is just the cleanup service running without anything else ?
What's the clearml version it is using ?
I'm getting:hydra_core == 1.1.1
What's the setup you have? python version, OS, Conda yes/no?
In the Task log itself it will say the version of all the packages, basically I wonder maybe it is using an older clearml version, and this is why I cannot reproduce it..
I'm saying that because in the task under "INSTALLED PACKAGES" this is what appears
This is exactly what I was looking for. Thanks!
Yes that makes sense, I think this bug was fixed a long time ago, and this is why I could not reproduce it.
I also think you can use a later version of clearml 🙂
Edit the cloned version and enqueue it?
HealthyStarfish45 this sounds very cool! How can I help?
is the "installed packages" part editable? good to know
Of course it is, when you clone a Task everything is Editable 🙂
Isn't it a bit risky manually changing a package version?
worst case it will crash quickly, and you reset/edit/enqueue 🙂
(Should work though)
Yes, sorry, that wasn't clear 🙂
or at least stick to the requirements.txt file rather than the actual environment
You can also for it to log the requirements.txt withTask.force_requirements_env_freeze(requirements_file="requirements.txt") task = Task.init(...)
When I look at the details, model artifact in the ClearML UI, it's been saved the usual way, and no tags that I added in the OutputModel constructor are there.
Did you disable the autologging ? Are you saying the tags not appearing is a bug (it might be) ?
Also, I don't mind auto logging either if I have control over publishing the model or not directly from that script, and adding tags etc, like OutputModel.
Sure you can publish models / add tags etc, wither from the UI or pr...
DrabCockroach54 notice here there is no aarch64 wheel for anything other than python 3.5...
(and in both cases only py 3.5/3.6 builds, everything else will be built from code)
https://pypi.org/project/pycryptodome/#files
Hi CleanPigeon16
Put the specific git into the "installed packages" section
It should look like:... git+
...
(No need for the specific commit, you can just take the latest)
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
Hi SteadySeagull18
However, it seems to be entirely hanging here in the "Running" state.
Did you set a an agent to listen to the "services" queue ?
Someone needs to run the pipeline logic itself, it is sometimes part of the clearml-server deployment but not a mist
This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this
No worries, if you have another handle on how/why/when we loose the current Task, please share 🙂