Reputation
Badges 1
25 × Eureka!Oh, fork the repository (this will create a copy on your GitHub account), this is done from GitHub's web page
Then commit to your repository (on the master branch)
Then in the GitHub page of the repository on your account, you will have a green button suggesting you to PR it 🙂
It has to be alive so all the "child nodes" could report to it....
Understood, then I would use Task.remote_execution()
Basically :
task = Task.init(...)
# config some stuff
task.remote_execute(quque_name_here)
# this line will be executed on the remote machine only
This will both automatically log your code / repo with Task.init, and the call to Task.remote_execute will stop the local process (on your machine that runs the hydra sweep) and continue on the remote machine.
This will both allow you to use Hydra sweet & schedule / run on remote ...
Hi CheekyElephant36
First you need to run it once on your machine, once this is done (only a few steps is enough), you can one it and enqueue it. Then to actually connect the aws autoscaler (the part that spins machines and runs tasks) go to applications and select the aqs autoscaler.
Btw i think the next video will be about YOLO + autoscaler
Hi @<1523704757024198656:profile|MysteriousWalrus11>
"parents": [
"step_two",
"step_four"
],
Seems like step 5 depends on steps 2+4 , how did you create it? what did the console say ?
Could it be your not actually passing any output from step3 ? how is it dependent on it ?
Hi FunnyTurkey96
Let me check what's the status here
(BTW: Is this for a specific Task or for a specific Project?)
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
Okay, I think I lost you...
DilapidatedDucks58 you mean detect at which "iteration" the max value was reported, and then extract all the other metrics for that iteration ?
GrievingTurkey78 MagnificentSeaurchin79 do you guys want to start a PR branch we cal all work on it?
BTW: you will be loosing the comments 😞
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
Hi @<1523701066867150848:profile|JitteryCoyote63>
Hi, how does
agent.enable_git_ask_pass
works
basically it pushes the pass through stdin to git when it asks (it is a git feature)
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
@<1547390422483996672:profile|StaleElk72> when you go to the dataset in the UI, and press on "Full Details" then go to the Artifacts tab, what is the link you see there?
Could it be the Args section of the task it clones does not have the "input_train_data" argument ?
Whats the trains server IP? It seems everything is configured with local host?
Glad to hear that! 🙂
So you could change it down the road if infra/hosting changes.
Internally this is doable and Enterprise edition supports it, at the end this is stored in DBs 🙂
Also in this case, I'm uploading the data to the public file server URL, but my k8 pod can't reach that for security reasons.
Yes, this is solvable as well (again sorry for pointing it, but only in the enterprise version), where you can specify per client or globally:
` path_substitution = [
# Replace regis...
Out of curiosity, what ended up being the issue?
Do you have any advice for this step, (monitoring)? I feel like it's not very well documented.
Yeah I think it is complicated.
I would start with the example here: None
Basically what it does is create histogram over time of the values the Rest API gets. Then in graphana it visualizes those values.
Notice that the request latency / frequency are automatically logged ...
@<1523703080200179712:profile|NastySeahorse61> / @<1523702868694011904:profile|AbruptCow41>
Is there a way to avoid each task to create a new environment?
You can just define CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
it will just use whatever you have there (notice it will totally ignore requirements.txt and "installed packages" on the Task)
BTW I would recommend turning on the venv caching, this is per docker/python/packages caching so the next time you are using th exact requi...
@<1523706266315132928:profile|DefiantHippopotamus88> seems like you are missing the ports 🙂
CLEARML_WEB_HOST="
"
CLEARML_API_HOST="
"
CLEARML_FILES_HOST="
"
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
Damn, okay I'll make sure we fix the order.
Could you verify the ~= works as intended (if the order id correct)
@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None
Well, in that case, just change the order it should solve it (I'll make sure we have that as the default:
conda_channels: ["pytorch", "conda-forge", "defaults", ]
It should solve the issue 🙂