Reputation
Badges 1
25 × Eureka!. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
Thanks EnviousStarfish54 !
Please let me know what you find π€
Could be nice to write some automation
ShaggyHare67 in the HPO the learning should be (based on the above):General/training_config/optimizer_params/Adam/learning_rate
Notice the "General" prefix (notice it is case sensitive)
1633204289496 clearml-services DEBUG docker: invalid reference format.
This is the strange message, like the execution command is not valid...
seems like the network inside the running code cannot access the localhost (even though you have --network=host
. Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib
, is this the...
And are you sure your are pointing to the correct API server and not mixing API with WEB address ?
Also what's the clearml-server version?
Please send the full log, I just tested it here, and it seems to be working
Hi @<1719524641879363584:profile|ThankfulClams64>
I am using ClearML Pro and pretty regularly I will restart an experiment and nothing will get logged to ClearML.
I use ClearML with pytorch 1.7.1, pytorch-lightning 1.2.2 and Tensorboard auto
All ClearML has the latest stable updates. (clearml 1.7.4, clearml-agent 1.7.2)
Is this still happening with the latest clearml ( clearml==1.16.3rc2
) ?
What is the TB version?
I remember a fix regrading lightining support
Also just making s...
It seems like the naming Task.create a lot of confusion (we are always open to suggestions and improvements). ReassuredTiger98 from your suggestion, it sounds like you would actually like more control in Task.init (let's leave Task.create aside, as its main function is Not to log the current running code, but to create an auxiliary Task).
Did I understand you correctly ?
Sure thing, and I agree it seems unlikely to be an issue π
okay this seems like a broken pip install python3.6
Can you verify it fails on another folder (maybe it's a permissions thing, for example if you run in docker mode, then the permissions will be root, as the docker is creating those folders)
MelancholyElk85 if you are manually adding models OutputModel, then when you call update_weights(...)
upload will start in the background (if the process ends it will wait until the upload is competed). You can also specify auto_delete_file
which will delete the local copy once the upload completes
Yes, the left side is the location of the file on the host machine, the right side is the location of the file inside the docker. in our case it is the same location
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
Hi JuicyFox94
I think you are correct, this bug will explain the entire thing.
Basically what happens is that remote_execute stops the local run before the configuration is set on the Task. Then running remotely the code pull the configuration, sees that it is empty and does nothing.
Let me see if I can reproduce it...
With pleasure, I'll make sure we officially release RC1 soon :)
GreasyPenguin66 Nice !!!
Very cool setup, and kudos on making it work with multiple users!
Quick question, shouldn't the JUPYTERHUB_API_TOKEN env variable be enough to gain access to the server? Why did you need to add it to the 'nbserver-x.json' as well?
Hi @<1689446563463565312:profile|SmallTurkey79>
This call is to set an existing (already created Task's requirements). Since it was just created it waits for the automatic package detection before overriding it.
What you want is " Task.force_requirements_env_freeze
" (notice Class level, that need to be called Before Task.init)
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
task = Task.init(...)
The upload itself is in the background.
It should not take long to prepare the plot for sending. Are you experiencing a major delay ?
Did you meantΒ
--detached
Β ?
Oops yes sorry you are correct should be --detached π
Because we are working with very big files, having them stored at multiple locations is something we try to avoid
Just so I better understand, is this for storing files as part of a dataset, or as debug samples ?
In other words can two diff processes create the exact same file (image) ?
FlatOctopus65
In my local environment
pipeline_package
is installed in development mode
In order to install the package you need to specify the git repo of the package, this is how the pipeline would know where to bring it from.
Either install it locally with "pip install git+ https://github.com/ ...." or add tp the packages
argument of the Pipeline wrapper packages = ["git+
https://github.com/
"] `
wdyt?
GrievingTurkey78 please feel free to send me code snippets to test π
Hi EnviousStarfish54
After the pop up do you see the plot on the web UI?