
Reputation
Badges 1
25 × Eureka!but perhaps it is worth adding to the docs page a hint to avoid using the CLEARML_TASK_ID env variable, perhaps I am not the only one to ever try it
Good idea, any thoughts on where ? I cannot find a trivial place to put these things
. Perhaps it is the imports at the start of the script only being assigned to the first task that is created?
Correct!
owever when I split the experiment task out completely it seems to have built the cloned task correctly.
Nice!!
I think it should look something like:files { gsc { contents: """{"type": "service_account", "project_id": "ai-platform", "private_key_id": "9999", "private_key": "-----BEGIN PRIVATE KEY-----==\n-----END PRIVATE KEY-----\n", "client_email": "a@ai.iam.gserviceaccount.com", "client_id": "111", "auth_uri": "
", "token_uri": "
", "auth_provider_x509_cert_url": "
", "client_x509_cert_url": "
"}""" path: "~/gs.cred" } }
Can you send the console output of this entire session please ?
Hi BroadMole98
What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?
Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I cr...
So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export
in the two cases and check the diff between them?export
If there was an SSL issue it should log to console right?
correct, also the agent is able to report, so I'm assuming configuration is correct
@<1724960464275771392:profile|DepravedBee82> could you try to put the clearml import + Task .init at the top of your code?
Hi JollyChimpanzee19
What are the versions (clearml , TF , PT), also could you add one more line from the stack (I.e. which call triggered the exception)
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
What happens when you call:
from clearml.backend_interface.task.repo import ScriptInfo
print(ScriptInfo._ScriptInfo__legacy_jupyter_notebook_server_json_parsing(None))
Hi OddShrimp85
If you pass 'output_uri=True' to task init, it will upload the model automatically, or as you said manually with outputmodel class
and I have no way to save those as clearml artifacts
You could do (at the end of the codetask.upload_artifact('profiler', Path('./fil-result/'))
wdyt?
thought the agent created a new conda env and installed all packages
It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point...
Hi ContemplativeCockroach39
Assuming you wrap your model with a flask app (or using any other serving solution), usually you need:
Get the model Add some metrics on runtime performance package in a dockerGetting a pretrained model is straight forward one you know either the creating Task or the Model ID
` from clearml import Task, Model
model_file_from_task = Task.get_task(task_id).models['output'][-1].get_local_copy()
or
model_file_from_model = Model(model_id=<moedl_id>).get_local_copy()...
now, I need to pass a variable to the Preprocess class
you mean for the construction ?
Hmm yes, that is a good point, maybe we should allow to specify a parameter on the model configuration to help with the actual type ...
ShallowCat10 Thank you for the kind words 🙂
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
A single query will return if the agent is running anything, and for how long, but I do not think you can get the idle time ...
Is there any contingency plan for an agent to continue running a task without reading the repository on the GitLab server?
Not sure what can be done ... any suggestions ?
At runtime, can I ask the agent to use some cached repository?
sometimes you will have it (as the agent stores a cached copy, but I would hardly count on it (and it might be at different states on different machines...)
... (due to regular maintenance service, something I cannot control).
Maybe let "th...
Yey!
Out of curiosity, what's the workflow with snowflake?
SlipperyDove40
FYI:args = task.connect(args, name="Args")
Is "kind of" reserved section for argparse. Meaning you can always use it, but argparse will also push/pull things from there. Is there any specific reason for not using a different section name?
Hi @<1691620877822595072:profile|FlutteringMouse14>
Yes, feast has been integrated by at least a couple if I remember correctly.
Basically there are two ways offline and online feature transformation. For offline your pipeline is exactly what would be recommended. The main difference is online transformation where I think feast is a great start
Yes it should
here is fastai example, just in case 🙂
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
Since pytorch is a special example (the agent will pick the correct pytorch based on the installed CUDA) , the agent will first make sure the file is downloaded, and then pass the resolving for pip to decide if it necessary to install. (bottom line, we downloaded the torch for no reason but it is cached so no real harm done) It might be the second package needs a specific numpy version... this resolving is don't by pip, not the agent specifically. Anyhow --system-site-packages is applicable o...
AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)
Hi @<1566596960691949568:profile|UpsetWalrus59>
just wondering - shouldn't the job still work if I didn't push the commit yet
How would that work? it does not know which commit to take? it would also fail on git diff apply, no?
GreasyPenguin14 whats the clearml version you are using, OS & Python ?
Notice this happens on the "connect_configuration" that seems to be called after the Task was closed, could that be the case ?