Reputation
Badges 1
25 × Eureka!you can also increase the limit here:
https://github.com/allegroai/clearml/blob/2e95881c76119964944eaa0289549617e8afeee9/docs/clearml.conf#L32
CurvedHedgehog15 is it plots or scalars you are after ?
Shout-out to Emilio for quickly stumbling on this rare bug and letting us know. If you have a feeling your process is stuck on exit, just upgrade to 1.0.1 😉
Thanks ShakyJellyfish91 ! please let me know what you come up with, I would love for us to fix this issue.
Hi @<1720249421582569472:profile|NonchalantSeaanemone34>
Is it possible to read data directly from server w/o using get_local_copy()?
do you mean an artifact ? what is direct here?
tasks.add_or_update_artifacts/v2.10 (Invalid task status: expected=created, status=completed)>
Hi UpsetCrow72
How come you are trying to sync
a "completed" (finalized) dataset ?
NaughtyFish36
No module named 'leap.learn.data_tools.merge_data.merge_data'
This seems to be the error but I cannot see leap
in the installed packages , Notice that if the Task has "Installed Packages" section then the agent will use that Not the "requirements.txt" , Only if this section is Empty it will revert to the "requirements.txt" in the repo.
How did you create the Task in the first place?
I see that you added "leap" into the initial bashscript, actually you should add i...
https://hub.docker.com/layers/nvidia/cuda/10.1-cudnn7-runtime-ubuntu18.04/images/sha256-963696628c9a0d27e9e5c11c5a588698ea22eeaf138cc9bff5368c189ff79968?context=explore
the docker image is missing the cudnn which is a must for TF to work 🙂
Sure thing!
BTW: not sure if it helps but the SaaS version integrates with Genesis Cloud I know they provide cheap GPUs might be worth checking
Can you share the storagemanager usage, and error you are getting ?
I was hoping that there's a universal flag somewhere. Asking this because I want all the Models and Artifacts to be stored in one place and the users shouldn't have to edit their configuration files.
You mean like make sure all models/artifacts are always uploaded?
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
Hi HappyLion37
It seems that you are "reusing" the Tasks. Which means the second time you open them you are essentially resetting the old run and starting all over.
Try to do:task1 = Task.init('examples', 'step one', reuse_last_task_id=False) print('do stuff') task1.close() task2 = Task.init('examples', 'step two', reuse_last_task_id=False) print('do some more stuff') task2.close()
Hmmm.
could you change the api_server:
http://localhost:8008 to your host IP?
for example:api_server:
http://192.168.1.11:8008
For that I need more info, what exactly do you need (or trying to achieve) ?
Could it be there is a Task.init being called Before this code snippet ?
Hi AverageBee39
Did you setup an agent to execute the actual Tasks ?
I have a question regarding running the code on the remote machine, each time I run the code I see the console in the ClearML server start downloading all the libraries I used in the code and when I run another code the same thing happens so why it has to download all the libraries again and many times?
I'm assuming you are referring to the installation, the downloaded python packages are cached.
You can turn on full caching by uncommenting the following line:
https://github.com/alleg...
@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None
Thanks FlutteringWorm14 , checking 🙂
but perhaps it is worth adding to the docs page a hint to avoid using the CLEARML_TASK_ID env variable, perhaps I am not the only one to ever try it
Good idea, any thoughts on where ? I cannot find a trivial place to put these things
ShaggyHare67 I'm just making sure I understand the setup:
First "manual" run of the base experiment. It creates an experiment in the system, you see all the hyper parameters under General section. trains-agent
running on a machine HPO example is executed with the above HP as optimization paamateres HPO creates clones of the original experiment, with different configurations (verified in the UI) trains-agent executes said experiments, aand they are not completed.But it seems the paramete...
(I am not an expert on UI to be honest)
Same here 🙂 lol
we can implement this externally
What do you mean by that?
maybe this can cause the issue?
Not likely.
In the original pipeline (the one executed from the Pycharm) do you see the "Pipeline" section under Configuration -> "Config objects" in the UI?
I suspect it's the localhost - and the trains-agent is trying too hard to access the port, but for some reason does not report an error ...
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
So what is the difference ? both running from the same machine ?
HI @<1687643893996195840:profile|RoundCat60>
Are you running on AWS ?